Bad blocks (and kernel panic) with SanDisk Extreme Pro 480GB.

I started using  SanDisk Extreme Pro 480GB recently on a laptop with Debian testing (Linux).

It started working OK, but already a couple of time for the past few days I experienced system hanging / kernel panic messages, with system reporing that filesystem (XFS) fails to write.

That’s what smartctl -a reports:

smartctl 6.5 2016-01-24 r4214 [x86\_64-linux-4.5.0-2-amd64] (local build) Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Marvell based SanDisk SSDs Device Model: SanDisk SDSSDXPS480G Serial Number: ###### LU WWN Device Id: ###### Firmware Version: X21200RL User Capacity: 480,103,981,056 bytes [480 GB] Sector Size: 512 bytes logical/physical Rotation Rate: Solid State Device Form Factor: 2.5 inches Device is: In smartctl database [for details use: -P show] ATA Version is: ACS-2 T13/2015-D revision 3 SATA Version is: SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Tue Jun 7 18:13:48 2016 EDT SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 0) seconds. Offline data collection capabilities: (0x11) SMART execute Offline immediate. No Auto Offline data collection support. Suspend Offline collection upon new command. No Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. No Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 10) minutes. SMART Attributes Data Structure revision number: 4 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE\_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN\_FAILED RAW\_VALUE 5 Reallocated\_Sector\_Ct 0x0032 100 100 --- Old\_age Always - 0 9 Power\_On\_Hours 0x0032 014 100 --- Old\_age Always - 14 12 Power\_Cycle\_Count 0x0032 100 100 --- Old\_age Always - 46 166 Min\_W/E\_Cycle 0x0032 100 100 --- Old\_age Always - 0 167 Min\_Bad\_Block/Die 0x0032 100 100 --- Old\_age Always - 46 168 Maximum\_Erase\_Cycle 0x0032 100 100 --- Old\_age Always - 2 169 Total\_Bad\_Block 0x0032 100 100 --- Old\_age Always - 935 171 Program\_Fail\_Count 0x0032 100 100 --- Old\_age Always - 0 172 Erase\_Fail\_Count 0x0032 100 100 --- Old\_age Always - 0 173 Avg\_Write/Erase\_Count 0x0032 100 100 --- Old\_age Always - 0 174 Unexpect\_Power\_Loss\_Ct 0x0032 100 100 --- Old\_age Always - 7 184 End-to-End\_Error 0x0032 100 100 --- Old\_age Always - 0 187 Reported\_Uncorrect 0x0032 100 100 --- Old\_age Always - 0 188 Command\_Timeout 0x0032 100 100 --- Old\_age Always - 0 194 Temperature\_Celsius 0x0022 059 043 --- Old\_age Always - 41 (Min/Max 25/43) 199 SATA\_CRC\_Error 0x0032 100 100 --- Old\_age Always - 0 212 SATA\_PHY\_Error 0x0032 100 100 --- Old\_age Always - 0 230 Perc\_Write/Erase\_Count 0x0032 100 100 --- Old\_age Always - 0 232 Perc\_Avail\_Resrvd\_Space 0x0033 100 100 004 Pre-fail Always - 100 233 Total\_NAND\_Writes\_GiB 0x0032 100 100 --- Old\_age Always - 40 241 Total\_Writes\_GiB 0x0030 253 253 --- Old\_age Offline - 17 242 Total\_Reads\_GiB 0x0030 253 253 --- Old\_age Offline - 11 244 Thermal\_Throttle 0x0032 000 100 --- Old\_age Always - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] Selective Self-tests/Logging not supported

You can notice that it shows 935 bad blocks already. Is that normal, or SSD is faulty?

Also, temperature is shown as 41. Is that rather high? The laptop had original drive (HDD) wrapped in a foil cover. When I replaced it with SSD, I wasn’t sure whether foil was needed, so I just transferred it on to SSD. May be it’s not a good idea heating wise?

Thanks!

Hm, I found this which sounds very suspicous: http://www.thinkwiki.org/wiki/Category:W540

 

Activating the NVIDIA GPU causes memory corruption leading to crashes and file system corruption. https://github.com/Bumblebee-Project/bbswitch/issues/78

Hi Shmerl,

I don’t have much experience on the Debian, but for the drive bad blocks reporting, since you have a 480G drive, I wouldn’t be too worried unless we see a rapid increase in bad block counts, in another word, this is still within acceptable range.

For the temp of the drive, it is within spec and you should not worry about it, if you run a lot of write on the drive the temp will raise a little bit, but I don’t see any problem with it.

it not working properly is a bad implementation of it in a crappy SSD controller.