12-06-2017 01:30 AM
Hello, maybe my express is not good, I am not sure if what I'm clear. The situation is following: I have inherited a custom embedded device which uses an Renesas H8S MCU to store data on an SD card. It does this directly with no filesystem implementing SPI at a low level in firmware written in assembler. The original (MMC, pre SPI) version worked for many years without a problem on 2GB SD http://www.kynix.com/Detail/1220745/SD.html cards.
A while back we started installing 8GB Kingston cards into the devices, as our traditional 2GB card choice were becoming scarce, so we had to have some modifications made to the firmware read/write routines to support the "new" SPI modes and timings. The developer got this working and thought it was all fine.
Now we have found that some of our data is being intermittently overwritten with blocks of 0x55, ie. like 55555555555555555... (this is 85d, or interestingly 101010101 in binary)
During normal operation, our firmware buffers some incoming data in RAM and NVRAM until it reaches a 512 block, then it writes a single block to the SDC, keeping track of the next address. It keeps writing to subsequent 512 blocks, never backtracking as it's adding to a log.
However when we read the data back, sometimes the older data has been overwritten with 0x55 in chunks of some currently unknown page size (at least 128 bytes, so probably 512). It seems these 55's are being written 'below' our data and sometimes overlap it.
We can debug the firmware (with HEW and the USB hardware emulator) but cannot see a problem in our code. I'm a bit blind to what's going on in the SDC as the only way I know to look at it, is to remove and image it (using
I'm pretty sure the data is corrupted on the card during write or shortly after, and not being read back wrongly. We have two fairly different read routines and they both return the same data. Debugging shows that the 55 data is coming off the card, and not being broken later in the processing.
So I can only guess that it's some race-condition, some internal caching or buffering in the SDC, or possibly the erase-block-size which might be overwriting larger areas than the 512 blocks we are writing.
The card is used for some other purposes in other, lower reserved areas, with various duty cycles in our firmware and also interrupts so it's possible there's a conflict there, but I would have thought this would have happened years ago.
It only happens with these 8BG cards. We haven't managed to try any other card variants/brands yet but do plan to.
So my questions are:
Today I tested as many cards as I could find, and found all but one fail in similar, but sometimes different ways.
Of these cards, the "worst" is the 16Gb Sandisk which seemed unpredictable, and even moved data around after it was written as subsequent views of the data are different. The ADATA card was also different, which needed two reads to read the true data. On the first read, the data was only partially visible, but on the second read it looked correct.
Most of them behaved in the same way, as described above, with old blocks looking like they were overwritten when subsequent writes move into new blocks (probably). Some of them failed almost immediately after one or two 512 block writes, much worse than our primary Kingston 8GB cards.
After watching the data strangely move around on these disks, I currently feel like the larger disks are "doing clever stuff" to the data (like buffering or caching) which we are only experiencing because we are writing/reading directly to memory addresses rather than respecting the filesystem layer (even if there isn't a specific bug in our code).
Today I focused on two things: the block size suggestion, and examining the READ routine - on thinking more about the fact the data seems written correctly, but read intermittently.
Firstly, I have tried sending CMD16 to set the block size to 512, but
a) it didn't have any effect on the failures (if it worked) b) I'm not 100% sure I got it right in assembler - although while debugging I did get a "parameter error" bit set in the response, which I resolved and then got a 0x00 response, so I think I'm making the command call right... c) On reading the simplified SPI spec, it clearly says the block size is 512 for SDHC and this command is not used. So many people say you should set the block size - is this for MMC cards? or is the spec wrong?
Secondly I examined the read routine and as before, it seems generally correct. I previously tried adding more "dummy clocks" with no effect.
Sorry to long novels, I have a little bit dizzy, thanks you taking your time to read it!