Reply
Newbie
Posts: 5
Registered: ‎08-10-2015
Accepted Solution

SD6SF1M128G1022I many ATA exceptions in syslog, looking for cause

I purchased one of those promising drives to have it as the boot drive for my omv NAS.

Unfortunately I'm getting quite a few ATA exceptions in the syslog:

Aug 10 16:26:52 msa-nas1 kernel: [ 1455.363285] ata1.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
Aug 10 16:26:52 msa-nas1 kernel: [ 1455.363328] ata1.01: BMDMA stat 0x64
Aug 10 16:26:52 msa-nas1 kernel: [ 1455.363350] ata1.01: failed command: READ DMA EXT
Aug 10 16:26:52 msa-nas1 kernel: [ 1455.363379] ata1.01: cmd 25/00:00:a0:3e:44/00:02:0a:00:00/f0 tag 0 dma 262144 in
Aug 10 16:26:52 msa-nas1 kernel: [ 1455.363379]          res 51/84:7f:21:3f:44/84:01:0a:00:00/f0 Emask 0x10 (ATA bus error)
Aug 10 16:26:52 msa-nas1 kernel: [ 1455.363456] ata1.01: status: { DRDY ERR }
Aug 10 16:26:52 msa-nas1 kernel: [ 1455.363478] ata1.01: error: { ICRC ABRT }
Aug 10 16:26:52 msa-nas1 kernel: [ 1455.363519] ata1: soft resetting link
Aug 10 16:26:53 msa-nas1 kernel: [ 1455.547847] ata1.00: configured for UDMA/33
Aug 10 16:26:53 msa-nas1 kernel: [ 1455.556895] ata1.01: configured for UDMA/33
Aug 10 16:26:53 msa-nas1 kernel: [ 1455.556915] ata1: EH complete
Aug 10 16:26:53 msa-nas1 kernel: [ 1455.595322] ata1.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
Aug 10 16:26:53 msa-nas1 kernel: [ 1455.595375] ata1.01: BMDMA stat 0x64
Aug 10 16:26:53 msa-nas1 kernel: [ 1455.595401] ata1.01: failed command: READ DMA EXT
Aug 10 16:26:53 msa-nas1 kernel: [ 1455.595438] ata1.01: cmd 25/00:00:a8:4e:44/00:02:0a:00:00/f0 tag 0 dma 262144 in
Aug 10 16:26:53 msa-nas1 kernel: [ 1455.595438]          res 51/84:8f:19:4f:44/84:01:0a:00:00/f0 Emask 0x10 (ATA bus error)
Aug 10 16:26:53 msa-nas1 kernel: [ 1455.595529] ata1.01: status: { DRDY ERR }
Aug 10 16:26:53 msa-nas1 kernel: [ 1455.595556] ata1.01: error: { ICRC ABRT }
Aug 10 16:26:53 msa-nas1 kernel: [ 1455.595603] ata1: soft resetting link
Aug 10 16:26:53 msa-nas1 kernel: [ 1455.783876] ata1.00: configured for UDMA/33
Aug 10 16:26:53 msa-nas1 kernel: [ 1455.792925] ata1.01: configured for UDMA/33
Aug 10 16:26:53 msa-nas1 kernel: [ 1455.792945] ata1: EH complete

A check with smart gave me this

root@msa-nas1:~# sudo smartctl -a /dev/sdf
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.16.0-0.bpo.4-amd64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:     Marvell based SanDisk SSDs
Device Model:     SanDisk SD6SF1M128G1022I
Serial Number:    144211400493
LU WWN Device Id: 5 001b44 c9f40532d
Firmware Version: X231200
User Capacity:    128,035,676,160 bytes [128 GB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 6
Local Time is:    Mon Aug 10 17:10:00 2015 HKT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (    0) seconds.
Offline data collection
capabilities:                    (0x11) SMART execute Offline immediate.
                                        No Auto Offline data collection support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        No Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  10) minutes.

SMART Attributes Data Structure revision number: 4
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0032   100   100   ---    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   253   100   ---    Old_age   Always       -       740
 12 Power_Cycle_Count       0x0032   100   100   ---    Old_age   Always       -       32
166 Min_W/E_Cycle           0x0032   100   100   ---    Old_age   Always       -       0
167 Min_Bad_Block/Die       0x0032   100   100   ---    Old_age   Always       -       29
168 Maximum_Erase_Cycle     0x0032   100   100   ---    Old_age   Always       -       7
169 Total_Bad_Block         0x0032   100   100   ---    Old_age   Always       -       209
171 Program_Fail_Count      0x0032   100   100   ---    Old_age   Always       -       0
172 Erase_Fail_Count        0x0032   100   100   ---    Old_age   Always       -       0
173 Avg_Write/Erase_Count   0x0032   100   100   ---    Old_age   Always       -       0
174 Unexpect_Power_Loss_Ct  0x0032   100   100   ---    Old_age   Always       -       10
187 Reported_Uncorrect      0x0032   100   100   ---    Old_age   Always       -       0
194 Temperature_Celsius     0x0022   061   050   ---    Old_age   Always       -       39 (Min/Max 25/50)
212 SATA_PHY_Error          0x0032   100   100   ---    Old_age   Always       -       426
230 Perc_Write/Erase_Count  0x0032   100   100   ---    Old_age   Always       -       0
232 Perc_Avail_Resrvd_Space 0x0033   100   100   004    Pre-fail  Always       -       100
233 Total_NAND_Writes_GiB   0x0032   100   100   ---    Old_age   Always       -       58
241 Total_Writes_GiB        0x0030   253   253   ---    Old_age   Offline      -       25
242 Total_Reads_GiB         0x0030   253   253   ---    Old_age   Offline      -       9
243 Unknown_Attribute       0x0032   100   100   ---    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%       715         -
# 2  Extended offline    Completed without error       00%       691         -
# 3  Extended offline    Completed without error       00%       667         -
# 4  Extended offline    Completed without error       00%       644         -
# 5  Extended offline    Completed without error       00%       621         -
# 6  Extended offline    Interrupted (host reset)      00%       618         -
# 7  Conveyance offline  Completed without error       00%         0         -
# 8  Conveyance offline  Completed without error       00%         0         -
# 9  Conveyance offline  Completed without error       00%         0         -
#10  Extended offline    Completed without error       00%       409         -
#11  Short offline       Aborted by host               00%       409         -

Device does not support Selective Self Tests/Logging

Any idea what goes wrong, why the SSD is causing all those exceptions inside the syslog?

The SMART info actually say quite a bit about some errors occured, from smartctl -x:

Device does not support Selective Self Tests/Logging
Warning: device does not support SCT Commands
SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x0001  4           54  Command failed due to ICRC error
0x0002  4           76  R_ERR response for data FIS
0x0005  4            0  R_ERR response for non-data FIS
0x000a  4            8  Device-to-host register FISes sent due to a COMRESET

Thanks for any help!

Highlighted
SanDisk Guru
Posts: 4,227
Registered: ‎07-18-2007

Re: SD6SF1M128G1022I many ATA exceptions in syslog, looking for cause

The SMART is showing some bad blocks. That could be the problem. You can try secure erasing the SSD and see if the issue continue to occur. If they do probably best to contact sandisk support for a replacement. 

Newbie
Posts: 5
Registered: ‎08-10-2015

Re: SD6SF1M128G1022I many ATA exceptions in syslog, looking for cause

[ Edited ]

Thanks for reply and help.

I'll do as you propose and will secure erase the drive, not so easy since the system is in production mode already and gotta find a replacement first... 

Could you elaborate which part of the SMART output told you about the bad blocks? Thanks!

 

edit: just checked the second NAS with identical hardware. It has the same exceptions in the syslog. Can this really be a coincidence that two drives from the same model both throw the same errors inside the syslog?

Anything I can try before I blame the hardware?

SanDisk Guru
Posts: 4,227
Registered: ‎07-18-2007

Re: SD6SF1M128G1022I many ATA exceptions in syslog, looking for cause

attribute 169 total bad block. Raw value is 209 which in and of itself does not necessarily mean a hardware problem All flash will have some bad blocks but if you see that number increasing quickly it may indicate a possible issue. 

Newbie
Posts: 5
Registered: ‎08-10-2015

Re: SD6SF1M128G1022I many ATA exceptions in syslog, looking for cause

[ Edited ]

The bad blocks have not increased since I first posted. This does not seem to be an issue.

But I found other things in the dmsg log. Seems like something is going wrong.

 

The SanDisk SSD is having this performance 

 

root@msa-nas1:~# hdparm -t -T /dev/sdf

/dev/sdf:
 Timing cached reads:   2144 MB in  2.00 seconds = 1072.18 MB/sec
 Timing buffered disk reads:   8 MB in  3.60 seconds =   2.22 MB/sec

 

while my other normal drives are having this

root@msa-nas1:~# hdparm -t -T /dev/sdg

/dev/sdg:
 Timing cached reads:   4164 MB in  2.00 seconds = 2082.04 MB/sec
 Timing buffered disk reads: 502 MB in  3.01 seconds = 166.81 MB/sec
root@msa-nas1:~# hdparm -t -T /dev/sda

/dev/sda:
 Timing cached reads:   4016 MB in  2.00 seconds = 2008.11 MB/sec
 Timing buffered disk reads: 504 MB in  3.00 seconds = 167.79 MB/sec

It seems to be on a slow level

root@msa-nas1:~# sudo hdparm -I /dev/sd{a,b,c,d,e,f,g} | grep -i udma
        DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6
        DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6
        DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6
        DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6
        DMA: mdma0 mdma1 mdma2 udma0 udma1 *udma2 udma3 udma4 udma5 udma6
        DMA: mdma0 mdma1 mdma2 udma0 *udma1 udma2 udma3 udma4 udma5 udma6
        DMA: mdma0 mdma1 mdma2 udma0 udma1 *udma2 udma3 udma4 udma5 udma6

But why?

 

The dmsg log shows that at boot there are some things wrong... ANy idea

[    1.021623] ata1: SATA max UDMA/133 abar m2048@0xff751000 port 0xff751100 irq 47
[    1.021627] ata2: SATA max UDMA/133 abar m2048@0xff751000 port 0xff751180 irq 47
[    1.021629] ata3: SATA max UDMA/133 abar m2048@0xff751000 port 0xff751200 irq 47
[    1.021631] ata4: SATA max UDMA/133 abar m2048@0xff751000 port 0xff751280 irq 47
[    1.022724] scsi4 : pata_atiixp
[    1.022834] scsi5 : pata_atiixp
[    1.022887] ata5: PATA max UDMA/100 cmd 0x1f0 ctl 0x3f6 bmdma 0xf100 irq 14
[    1.022888] ata6: PATA max UDMA/100 cmd 0x170 ctl 0x376 bmdma 0xf108 irq 15
...
[    1.184383] ata6.00: ATA-9: WDC WD60EFRX-68MYMN1, 82.00A82, max UDMA/133
[    1.184388] ata6.00: 11721045168 sectors, multi 16: LBA48 NCQ (depth 0/32)
[    1.184392] ata6.00: limited to UDMA/33 due to 40-wire cable
[    1.192314] ata6.00: configured for UDMA/33
[    1.192459] ata5.00: ATA-9: WDC WD60EFRX-68MYMN1, 82.00A82, max UDMA/133
[    1.192460] ata5.00: 11721045168 sectors, multi 16: LBA48 NCQ (depth 0/32)
[    1.193091] ata5.01: ATA-8: SanDisk SD6SF1M128G1022I, X231200, max UDMA/133
[    1.193095] ata5.01: 250069680 sectors, multi 1: LBA48 NCQ (depth 0/32)
[    1.193743] ata5.00: limited to UDMA/33 due to 40-wire cable
[    1.193746] ata5.01: limited to UDMA/33 due to 40-wire cable
[    1.200190] ata5.00: configured for UDMA/33
[    1.209308] ata5.01: configured for UDMA/33
...
[    1.511838] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[    1.511857] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[    1.511872] ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[    1.511887] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[    1.512605] ata4.00: ATA-9: WDC WD60EFRX-68MYMN1, 82.00A82, max UDMA/133
[    1.512609] ata4.00: 11721045168 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
[    1.512617] ata1.00: ATA-9: WDC WD60EFRX-68MYMN1, 82.00A82, max UDMA/133
[    1.512619] ata1.00: 11721045168 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
[    1.512624] ata3.00: ATA-9: WDC WD60EFRX-68MYMN1, 82.00A82, max UDMA/133
[    1.512626] ata3.00: 11721045168 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
[    1.512631] ata2.00: ATA-9: WDC WD60EFRX-68MYMN1, 82.00A82, max UDMA/133
[    1.512634] ata2.00: 11721045168 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
[    1.513323] ata4.00: configured for UDMA/133
[    1.513346] ata2.00: configured for UDMA/133
[    1.513352] ata3.00: configured for UDMA/133
[    1.513362] ata1.00: configured for UDMA/133
...
[63210.387399] ata5.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
[63210.387449] ata5.01: BMDMA stat 0x64
[63210.387475] ata5.01: failed command: READ DMA
[63210.387508] ata5.01: cmd c8/00:00:e0:cd:89/00:00:00:00:00/f8 tag 0 dma 131072 in
[63210.387508]          res 51/84:bf:21:ce:89/00:00:00:00:00/f8 Emask 0x10 (ATA bus error)
[63210.387599] ata5.01: status: { DRDY ERR }
[63210.387625] ata5.01: error: { ICRC ABRT }
[63210.387672] ata5: soft resetting link
[63210.579960] ata5.00: configured for UDMA/33
[63210.589020] ata5.01: configured for UDMA/33
[63210.589087] ata5: EH complete
....

 what's this about?

limited to UDMA/33 due to 40-wire cable
Newbie
Posts: 5
Registered: ‎08-10-2015

Re: SD6SF1M128G1022I many ATA exceptions in syslog, looking for cause

[ Edited ]

any comment from SanDisk on this would be much appreciated! I really wonder what's going wrong here.

Thanks for anything in advance! Smiley Happy

Newbie
Posts: 5
Registered: ‎08-10-2015

Re: SD6SF1M128G1022I many ATA exceptions in syslog, looking for cause

Well, I solved the problem... finally. http://unix.stackexchange.com/a/225781/41400

 

In case someone else gets stuck with this... But I can not recommend getting this drive model when intended to use with Debian...