**SPI SD Card (128GB) Fails After Several Hours – File System Corruption with FATFS

Hello,

We are working on an STM32-based embedded system that logs parameter and alarm data to a 128GB SD card at fixed intervals using the FATFS library (initially v0.12, now updated to v0.15). The SD card is interfaced via SPI and formatted as FAT32 (we also tested exFAT for comparison).
###Issue Description:

After flashing the firmware, the SD card operates reliably for several hours. However, beyond this duration, the SD card communication begins to fail — the filesystem cannot be mounted, and in many cases, the entire card becomes unreadable on both the MCU and a PC. This results in loss of all logged data.

This issue only occurs with the 128GB SD card (SanDisk SDXC). When using a 32GB SDHC card, the same firmware and hardware setup runs stably without any problems, even for extended periods.

What We’ve Tried:

  • Upgraded FATFS from v0.12 to v0.15
  • Reformatted the SD card using both FAT32 and exFAT file systems
  • Verified that all file handles are properly closed using f_close()
  • Power supply is stable — no unexpected shutdowns or brownouts
  • Ensured f_write() is used properly, but f_sync() is not currently being called after each write

Firmware Details:

We are using low-level disk I/O functions for SD card initialization and read/write (via SPI), adapted from the Elm-Chan FATFS diskio layer. Sample snippets are included below for reference (initialization, read, write).

Request for Guidance:

We would appreciate any help or suggestions on how to improve firmware robustness and ensure reliable long-term operation with large-capacity SD cards (especially SDXC over SPI). Specifically:

  1. Are there known reliability issues or special considerations when using SDXC (128GB) over SPI mode with FATFS?
  2. Is it essential to call f_sync() after each f_write() to avoid filesystem corruption in long-running systems?
  3. Are there recommendations for how often files should be opened/closed (e.g., once per write or kept open during operation)?
  4. Could SDXC cards behave differently with respect to CMD8/CMD58 or ACMD41 during initialization in SPI mode?
  5. Are there specific SD card brands or models known to work reliably in embedded applications?

Any suggestions, proven configurations, or example implementations that improve SD card stability over SPI would be greatly appreciated.

Thank you in advance!

SD cards are NAND based so they have limited write endurance. Now most cards are MLC or TLC so they are able to handle some use.

Using a program to write the card should use standard I/O procedures. One a file is closed no need for a sync call. I am good with C++

Why don’t you erase the card, then write “LBAn = n” to each sector? After doing this, verify each sector. Then you will be able to see the nature of the corruption.

For example, are bits being flipped (data problem)? Are bits being dropped or duplicated (clock problem)?