SD6SF1M128G1022I many ATA exceptions in syslog, looking for cause

I purchased one of those promising drives to have it as the boot drive for my omv NAS.

Unfortunately I’m getting quite a few ATA exceptions in the syslog:

Aug 10 16:26:52 msa-nas1 kernel: [1455.363285] ata1.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 Aug 10 16:26:52 msa-nas1 kernel: [1455.363328] ata1.01: BMDMA stat 0x64 Aug 10 16:26:52 msa-nas1 kernel: [1455.363350] ata1.01: failed command: READ DMA EXT Aug 10 16:26:52 msa-nas1 kernel: [1455.363379] ata1.01: cmd 25/00:00:a0:3e:44/00:02:0a:00:00/f0 tag 0 dma 262144 in Aug 10 16:26:52 msa-nas1 kernel: [1455.363379] res 51/84:7f:21:3f:44/84:01:0a:00:00/f0 Emask 0x10 (ATA bus error) Aug 10 16:26:52 msa-nas1 kernel: [1455.363456] ata1.01: status: { DRDY ERR } Aug 10 16:26:52 msa-nas1 kernel: [1455.363478] ata1.01: error: { ICRC ABRT } Aug 10 16:26:52 msa-nas1 kernel: [1455.363519] ata1: soft resetting link Aug 10 16:26:53 msa-nas1 kernel: [1455.547847] ata1.00: configured for UDMA/33 Aug 10 16:26:53 msa-nas1 kernel: [1455.556895] ata1.01: configured for UDMA/33 Aug 10 16:26:53 msa-nas1 kernel: [1455.556915] ata1: EH complete Aug 10 16:26:53 msa-nas1 kernel: [1455.595322] ata1.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 Aug 10 16:26:53 msa-nas1 kernel: [1455.595375] ata1.01: BMDMA stat 0x64 Aug 10 16:26:53 msa-nas1 kernel: [1455.595401] ata1.01: failed command: READ DMA EXT Aug 10 16:26:53 msa-nas1 kernel: [1455.595438] ata1.01: cmd 25/00:00:a8:4e:44/00:02:0a:00:00/f0 tag 0 dma 262144 in Aug 10 16:26:53 msa-nas1 kernel: [1455.595438] res 51/84:8f:19:4f:44/84:01:0a:00:00/f0 Emask 0x10 (ATA bus error) Aug 10 16:26:53 msa-nas1 kernel: [1455.595529] ata1.01: status: { DRDY ERR } Aug 10 16:26:53 msa-nas1 kernel: [1455.595556] ata1.01: error: { ICRC ABRT } Aug 10 16:26:53 msa-nas1 kernel: [1455.595603] ata1: soft resetting link Aug 10 16:26:53 msa-nas1 kernel: [1455.783876] ata1.00: configured for UDMA/33 Aug 10 16:26:53 msa-nas1 kernel: [1455.792925] ata1.01: configured for UDMA/33 Aug 10 16:26:53 msa-nas1 kernel: [1455.792945] ata1: EH complete

A check with smart gave me this

root@msa-nas1:~# sudo smartctl -a /dev/sdf smartctl 5.41 2011-06-09 r3365 [x86\_64-linux-3.16.0-0.bpo.4-amd64] (local build) Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Model Family: Marvell based SanDisk SSDs Device Model: SanDisk SD6SF1M128G1022I Serial Number: 144211400493 LU WWN Device Id: 5 001b44 c9f40532d Firmware Version: X231200 User Capacity: 128,035,676,160 bytes [128 GB] Sector Size: 512 bytes logical/physical Device is: In smartctl database [for details use: -P show] ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 6 Local Time is: Mon Aug 10 17:10:00 2015 HKT SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 0) seconds. Offline data collection capabilities: (0x11) SMART execute Offline immediate. No Auto Offline data collection support. Suspend Offline collection upon new command. No Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. No Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 10) minutes. SMART Attributes Data Structure revision number: 4 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE\_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN\_FAILED RAW\_VALUE 5 Reallocated\_Sector\_Ct 0x0032 100 100 --- Old\_age Always - 0 9 Power\_On\_Hours 0x0032 253 100 --- Old\_age Always - 740 12 Power\_Cycle\_Count 0x0032 100 100 --- Old\_age Always - 32 166 Min\_W/E\_Cycle 0x0032 100 100 --- Old\_age Always - 0 167 Min\_Bad\_Block/Die 0x0032 100 100 --- Old\_age Always - 29 168 Maximum\_Erase\_Cycle 0x0032 100 100 --- Old\_age Always - 7 169 Total\_Bad\_Block 0x0032 100 100 --- Old\_age Always - 209 171 Program\_Fail\_Count 0x0032 100 100 --- Old\_age Always - 0 172 Erase\_Fail\_Count 0x0032 100 100 --- Old\_age Always - 0 173 Avg\_Write/Erase\_Count 0x0032 100 100 --- Old\_age Always - 0 174 Unexpect\_Power\_Loss\_Ct 0x0032 100 100 --- Old\_age Always - 10 187 Reported\_Uncorrect 0x0032 100 100 --- Old\_age Always - 0 194 Temperature\_Celsius 0x0022 061 050 --- Old\_age Always - 39 (Min/Max 25/50) 212 SATA\_PHY\_Error 0x0032 100 100 --- Old\_age Always - 426 230 Perc\_Write/Erase\_Count 0x0032 100 100 --- Old\_age Always - 0 232 Perc\_Avail\_Resrvd\_Space 0x0033 100 100 004 Pre-fail Always - 100 233 Total\_NAND\_Writes\_GiB 0x0032 100 100 --- Old\_age Always - 58 241 Total\_Writes\_GiB 0x0030 253 253 --- Old\_age Offline - 25 242 Total\_Reads\_GiB 0x0030 253 253 --- Old\_age Offline - 9 243 Unknown\_Attribute 0x0032 100 100 --- Old\_age Always - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test\_Description Status Remaining LifeTime(hours) LBA\_of\_first\_error # 1 Extended offline Completed without error 00% 715 - # 2 Extended offline Completed without error 00% 691 - # 3 Extended offline Completed without error 00% 667 - # 4 Extended offline Completed without error 00% 644 - # 5 Extended offline Completed without error 00% 621 - # 6 Extended offline Interrupted (host reset) 00% 618 - # 7 Conveyance offline Completed without error 00% 0 - # 8 Conveyance offline Completed without error 00% 0 - # 9 Conveyance offline Completed without error 00% 0 - #10 Extended offline Completed without error 00% 409 - #11 Short offline Aborted by host 00% 409 - Device does not support Selective Self Tests/Logging

Any idea what goes wrong, why the SSD is causing all those exceptions inside the syslog?

The SMART info actually say quite a bit about some errors occured, from smartctl -x:

Device does not support Selective Self Tests/Logging Warning: device does not support SCT Commands SATA Phy Event Counters (GP Log 0x11) ID Size Value Description 0x0001 4 54 Command failed due to ICRC error 0x0002 4 76 R\_ERR response for data FIS 0x0005 4 0 R\_ERR response for non-data FIS 0x000a 4 8 Device-to-host register FISes sent due to a COMRESET

Thanks for any help!

The SMART is showing some bad blocks. That could be the problem. You can try secure erasing the SSD and see if the issue continue to occur. If they do probably best to contact sandisk support for a replacement. 

1 Like

Thanks for reply and help.

I’ll do as you propose and will secure erase the drive, not so easy since the system is in production mode already and gotta find a replacement first… 

Could you elaborate which part of the SMART output told you about the bad blocks? Thanks!

edit: just checked the second NAS with identical hardware. It has the same exceptions in the syslog. Can this really be a coincidence that two drives from the same model both throw the same errors inside the syslog?

Anything I can try before I blame the hardware?

attribute 169 total bad block. Raw value is 209 which in and of itself does not necessarily mean a hardware problem All flash will have some bad blocks but if you see that number increasing quickly it may indicate a possible issue. 

The bad blocks have not increased since I first posted. This does not seem to be an issue.

But I found other things in the dmsg log. Seems like something is going wrong.

The SanDisk SSD is having this performance 

root@msa-nas1:~# hdparm -t -T /dev/sdf /dev/sdf: Timing cached reads: 2144 MB in 2.00 seconds = 1072.18 MB/sec Timing buffered disk reads: 8 MB in 3.60 seconds = 2.22 MB/sec

while my other normal drives are having this

root@msa-nas1:~# hdparm -t -T /dev/sdg /dev/sdg: Timing cached reads: 4164 MB in 2.00 seconds = 2082.04 MB/sec Timing buffered disk reads: 502 MB in 3.01 seconds = 166.81 MB/sec root@msa-nas1:~# hdparm -t -T /dev/sda /dev/sda: Timing cached reads: 4016 MB in 2.00 seconds = 2008.11 MB/sec Timing buffered disk reads: 504 MB in 3.00 seconds = 167.79 MB/sec

It seems to be on a slow level

root@msa-nas1:~# sudo hdparm -I /dev/sd{a,b,c,d,e,f,g} | grep -i udma DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 \*udma6 DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 \*udma6 DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 \*udma6 DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 \*udma6 DMA: mdma0 mdma1 mdma2 udma0 udma1 \*udma2 udma3 udma4 udma5 udma6 DMA: mdma0 mdma1 mdma2 udma0 \*udma1 udma2 udma3 udma4 udma5 udma6 DMA: mdma0 mdma1 mdma2 udma0 udma1 \*udma2 udma3 udma4 udma5 udma6

But why?

The dmsg log shows that at boot there are some things wrong… ANy idea

[1.021623] ata1: SATA max UDMA/133 abar m2048@0xff751000 port 0xff751100 irq 47 [1.021627] ata2: SATA max UDMA/133 abar m2048@0xff751000 port 0xff751180 irq 47 [1.021629] ata3: SATA max UDMA/133 abar m2048@0xff751000 port 0xff751200 irq 47 [1.021631] ata4: SATA max UDMA/133 abar m2048@0xff751000 port 0xff751280 irq 47 [1.022724] scsi4 : pata\_atiixp [1.022834] scsi5 : pata\_atiixp [1.022887] ata5: PATA max UDMA/100 cmd 0x1f0 ctl 0x3f6 bmdma 0xf100 irq 14 [1.022888] ata6: PATA max UDMA/100 cmd 0x170 ctl 0x376 bmdma 0xf108 irq 15 ... [1.184383] ata6.00: ATA-9: WDC WD60EFRX-68MYMN1, 82.00A82, max UDMA/133 [1.184388] ata6.00: 11721045168 sectors, multi 16: LBA48 NCQ (depth 0/32) [1.184392] ata6.00: limited to UDMA/33 due to 40-wire cable [1.192314] ata6.00: configured for UDMA/33 [1.192459] ata5.00: ATA-9: WDC WD60EFRX-68MYMN1, 82.00A82, max UDMA/133 [1.192460] ata5.00: 11721045168 sectors, multi 16: LBA48 NCQ (depth 0/32) [1.193091] ata5.01: ATA-8: SanDisk SD6SF1M128G1022I, X231200, max UDMA/133 [1.193095] ata5.01: 250069680 sectors, multi 1: LBA48 NCQ (depth 0/32) [1.193743] ata5.00: limited to UDMA/33 due to 40-wire cable [1.193746] ata5.01: limited to UDMA/33 due to 40-wire cable [1.200190] ata5.00: configured for UDMA/33 [1.209308] ata5.01: configured for UDMA/33 ... [1.511838] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [1.511857] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [1.511872] ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [1.511887] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [1.512605] ata4.00: ATA-9: WDC WD60EFRX-68MYMN1, 82.00A82, max UDMA/133 [1.512609] ata4.00: 11721045168 sectors, multi 16: LBA48 NCQ (depth 31/32), AA [1.512617] ata1.00: ATA-9: WDC WD60EFRX-68MYMN1, 82.00A82, max UDMA/133 [1.512619] ata1.00: 11721045168 sectors, multi 16: LBA48 NCQ (depth 31/32), AA [1.512624] ata3.00: ATA-9: WDC WD60EFRX-68MYMN1, 82.00A82, max UDMA/133 [1.512626] ata3.00: 11721045168 sectors, multi 16: LBA48 NCQ (depth 31/32), AA [1.512631] ata2.00: ATA-9: WDC WD60EFRX-68MYMN1, 82.00A82, max UDMA/133 [1.512634] ata2.00: 11721045168 sectors, multi 16: LBA48 NCQ (depth 31/32), AA [1.513323] ata4.00: configured for UDMA/133 [1.513346] ata2.00: configured for UDMA/133 [1.513352] ata3.00: configured for UDMA/133 [1.513362] ata1.00: configured for UDMA/133 ... [63210.387399] ata5.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 [63210.387449] ata5.01: BMDMA stat 0x64 [63210.387475] ata5.01: failed command: READ DMA [63210.387508] ata5.01: cmd c8/00:00:e0:cd:89/00:00:00:00:00/f8 tag 0 dma 131072 in [63210.387508] res 51/84:bf:21:ce:89/00:00:00:00:00/f8 Emask 0x10 (ATA bus error) [63210.387599] ata5.01: status: { DRDY ERR } [63210.387625] ata5.01: error: { ICRC ABRT } [63210.387672] ata5: soft resetting link [63210.579960] ata5.00: configured for UDMA/33 [63210.589020] ata5.01: configured for UDMA/33 [63210.589087] ata5: EH complete ....

 what’s this about?

limited to UDMA/33 due to 40-wire cable

any comment from SanDisk on this would be much appreciated! I really wonder what’s going wrong here.

Thanks for anything in advance! :slight_smile:

Well, I solved the problem… finally. http://unix.stackexchange.com/a/225781/41400

In case someone else gets stuck with this… But I can not recommend getting this drive model when intended to use with Debian…