Jump to content

Flash drive reliability - A study


skp51443

Recommended Posts

The Register has a good synopsis of the study which should be of interest to anyone thinking about a flash drive and the linked PDF is interesting reading if a bit lacking in plot or suspense for the non-geeks. :-)

 

http://www.theregister.co.uk/2016/02/28/commodity_flash_just_as_good_as_enterprise_kit_google_data_suggests/

 

 

If you're loading up a heap of flash drives for your data centre, don't bother with “enterprise-class” SLC (single level cell) technology, because cheaper MLC (multi-level cell) drives will do the job just as well.

However, the data centre biz needs new techniques to predict drive failures, because the unrecoverable bit error rate (UBER) sysadmins watch to spot spinning rust going to sleep is useless for flash media.
Those are two conclusions in Google-backed research from the University of Toronto, Flash Reliability in Production: The Expected and the Unexpected, presented to last week's Usenix FAST 16 conference.
Bianca Schroeder worked with Googlers Raghav Lagisetty and Arif Merchant to slice and dice more than six years' worth of production data on “many millions of drive days, ten different drive models, different Flash technologies”, gathered from Google's vast fleet of computing devices.

 

The direct link to the PDF:

 

http://0b4af6cdc2f0c5998459-c0245c5c937c5dedcca3f1764ecc9b2f.r43.cf2.rackcdn.com/23105-fast16-papers-schroeder.pdf

 

I found this most interesting, and it is something I'm going to look at on any new flash drive I buy.

 

 

Interestingly, we find that the number of factory bad blocks is to some degree predictive of other issues the drive might develop in the field: For example, we observe that for all but one drive model the drives that have above the 95%ile of factory bad blocks have a higher fraction of developing new bad blocks in the field and final write errors, compared to an average drive of the same model. They also have a higher fraction that develops some type of read error (either final or non-final). The drives in the bottom 5%ile have a lower fraction of timeout errors than average.

 

We summarize our observations regarding bad blocks as follows: Bad blocks are common: 30-80% of drives develop at least one in the field. The degree of correlation
between bad blocks in a drive is surprisingly strong: after only 2-4 bad blocks on a drive, there is a 50% chance that hundreds of bad blocks will follow. Nearly all drives
come with factory bad blocks, and the number of factory bad blocks shows a correlation with the number of bad blocks the drive will develop in the field, as well as a few other errors that occur in the field.

 

 

We observe significant differences in the repair rates between different models. While for most drive models 6-9% of their population at some point required repairs, there are some drive models, e.g. SLC-B and SLC-C, that enter repairs at significantly higher rates of 30% and 26%, respectively. Looking at the time between repairs (i.e. dividing the total number of drive days by the total number of repair events, see row 3 in Table 5) we see a range of a couple of thousand days between repairs for the worst models to nearly 15,000 days between repairs for the best models. We also looked at how often in their life drives entered repairs: The vast majority (96%) of drives that go to repairs, go there only once in their life.

 

Section 7 and 8 are worth a glance too, on page 78 of the PDF discussing the various types of flash and comparisons to hard drives.

 

The summary, Section 10 on page 79 is a quick read if you want to avoid the more technical details.

Link to comment
Share on other sites

This study is aimed more at flash drives using SATA interfaces, while the basic memory chips are similar the sticks have many differences in internal operation so it is hard to use these numbers to make much in the way of predictions to the memory sticks.

 

The numbers above might be useful when looking at drives internally similar to SATA II or III interface versions, M.2, U.2, SATA (aka 3.2) Express, NVMe.

 

Here is some info on USB interface flash drives, not as detailed but interesting.

 

http://www.zdnet.com/article/usb-drive-life-fact-or-fiction/

Link to comment
Share on other sites

Here is the first lines of the abstract.

 

"As solid state drives based on flash technology are becoming

a staple for persistent data storage in data centers,
it is important to understand their reliability characteristics."

 

So while lay persons call flash drives the little Micro SD, SDHC, and SDXC cards, or thumbdrives/USB Flash drives, while accurate, are not what this paper is about. This paper, (which is excellent BTW thanks,) is about what we commonly call SSDs.

 

The main takeaway for me is that given the same specs for read and write speeds or close, that MLC SSDs are as reliable as SLC SSDs which I thought the opposite about. So I can buy the less expensive SSDs with MLC construction with the same reliability in a comparison from the same manufacturer/build quality.

 

For everyone else, they are working on a predictive model utility for SSDs like we have for HDs now.

 

And the report points up the need to check for disk errors on receipt, and take it from there as a rough guide for now to predict future reliability.

 

This points up the wisdom of keeping regular backups of your entire system. My recommendation has always been to make images which are a bit harder with Windows OS 8/8.1, 10.

 

Doing a bit by bit exact clone of a drive regularly and externally and keeping it in reserve is a good practice that can have the same results. There are several programs that will clone say 300 GB of data on a 1TB HD to a 500GB smaller drive. The smaller drive can hold the data but a bit by bit may not be possible with the new boot security. Since I have 20 or so HDs 500GBs up to 3TB individual drives in both 2.5" and 3.5" form factors, this article prompts me to get off my duff and put it at the top of my priority list. Once I figure out a good way to make complete system images, and restore them personally to the new Windows high security hard and software, I'll post here.

 

In the meantime, for those who think Images and Clones are above their heads, make sure you have a copy of all programs loaded on your PC, not bootleg copies as they may not work again, and once you know you have all your software saved to disks or drives or stored if you ordered a disk copy, then you make sure you back up all your data and Windows makes that easy as I have posted here recently.

Link to comment
Share on other sites

Thanks for the interesting info on SSDs. I tried to find out how many bad blocks are on the SSD (main drive) in my Mac but the drive test utility just reports the drive as passing all its tests without reporting the number of bad blocks. Some research indicates I probably need some 3ed party disk utility.

 

Cloning the main drive on an Apple OSX computer is very easy using a utility called "SuperDuper!". It's quite fast and the computer can be booted and operated from the clone drive exactly as if it were the main drive in the computer. I've been using this backup program for several years on several versions of OSX and never had a problem with it.

 

---ron

Link to comment
Share on other sites

I don't know Macs but I am pretty sure they support SMART, a drive monitor system that might get you your bad block info. Try a Google and paste these terms in:

mac smart status utility

This is a SMART report on my Linux box, the lines "Grown_Failing_Block_Ct" and "Factory_Bad_Block_Ct" are the ones you'd want to see:

p490:/home/stan # smartctl -a /dev/sda 
smartctl 6.2 2013-11-07 r3856 [x86_64-linux-4.1.15-8-default] (SUSE RPM)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Crucial/Micron RealSSD m4/C400/P400
Device Model:     M4-CT128M4SSD2
Serial Number:    000000001139031A2F56
LU WWN Device Id: 5 00a075 1031a2f56
Firmware Version: 070H
User Capacity:    128,035,676,160 bytes [128 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 6
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Wed Mar  2 12:11:13 2016 MST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x80) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever 
                                        been run.
Total time to complete Offline 
data collection:                (  595) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine 
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (   9) minutes.
Conveyance self-test routine
recommended polling time:        (   3) minutes.
SCT capabilities:              (0x003d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   100   050    Pre-fail  Always       -       0
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   100   100   001    Old_age   Always       -       15939
 12 Power_Cycle_Count       0x0032   100   100   001    Old_age   Always       -       1463
170 Grown_Failing_Block_Ct  0x0033   100   100   010    Pre-fail  Always       -       0
171 Program_Fail_Count      0x0032   100   100   001    Old_age   Always       -       0
172 Erase_Fail_Count        0x0032   100   100   001    Old_age   Always       -       0
173 Wear_Leveling_Count     0x0033   099   099   010    Pre-fail  Always       -       53
174 Unexpect_Power_Loss_Ct  0x0032   100   100   001    Old_age   Always       -       113
181 Non4k_Aligned_Access    0x0022   100   100   001    Old_age   Always       -       108 11 96
183 SATA_Iface_Downshift    0x0032   100   100   001    Old_age   Always       -       0
184 End-to-End_Error        0x0033   100   100   050    Pre-fail  Always       -       0
187 Reported_Uncorrect      0x0032   100   100   001    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   001    Old_age   Always       -       0
189 Factory_Bad_Block_Ct    0x000e   100   100   001    Old_age   Always       -       82
194 Temperature_Celsius     0x0022   100   100   000    Old_age   Always       -       0
195 Hardware_ECC_Recovered  0x003a   100   100   001    Old_age   Always       -       0
196 Reallocated_Event_Count 0x0032   100   100   001    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   100   100   001    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   100   001    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   100   100   001    Old_age   Always       -       0
202 Perc_Rated_Life_Used    0x0018   099   099   001    Old_age   Offline      -       1
206 Write_Error_Rate        0x000e   100   100   001    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

A lot of Linux utilities give the info in friendlier format but that is hard to copy/paste to the forums, I'd guess the same is true for the Mac line.

Link to comment
Share on other sites

Looking at SSDs and the thoughts in the PDF above, the discussion of better fault management really is going to become important as drive sizes keep climbing.

 

This new Samsung drive really doesn't apply except to PowerBall winners and folks spending company money but it bodes well as the tech trickles down:

 

http://www.theregister.co.uk/2016/03/03/samsung_2_5_inch_15tb_ssd/

 

 

Samsung's 2-5-inch 15.36TB SSD is now shipping, with half as much capacity again as the 10TB 3.5-inch disk drive capacity alternative, and taking up less physical space.

This relative monster of an SSD first appeared at the Flash Memory Summit in August 2015.
The PM1633a drive houses 512 x 256Gb 48-layer V-NAND chips stacked in 16 layers to create a 512GB package. Thirty two of these packages are then used inside the 15.36TB drive. So the individual chips are 3D in nature, and then the 512GB packages with their 16 layers are 3D too.

 

No pricing given which is good, otherwise a lot of us might be driven to tears, hope for something we can afford is coming though:

 

 

Later this year Samsung will ship 480GB, 960GB, 1.92TB, 3.84TB, and 7.68TB versions of the drive. It expects this drive "to rapidly become the overwhelming favourite over hard disks for enterprise storage systems," not differentiating between performance and capacity data storage in its statement.

 

Plans are already being made for a 30 TB plus SSD too. Now if they drop the 2.5 inch form factor and go to 3.5 or even 5.25 size formats you are looking at incredible storage spaces, not sure even SATA III can move data fast enough to make full use of that capacity on a single device!

Link to comment
Share on other sites

I don't know Macs but I am pretty sure they support SMART, a drive monitor system that might get you your bad block info. Try a Google and paste these terms in:

mac smart status utility

This is a SMART report on my Linux box, the lines "Grown_Failing_Block_Ct" and "Factory_Bad_Block_Ct" are the ones you'd want to see:

p490:/home/stan # smartctl -a /dev/sda .....

A lot of Linux utilities give the info in friendlier format but that is hard to copy/paste to the forums, I'd guess the same is true for the Mac line.

 

Thanks Stan. I looked for a SMART utility on the Apple website but can't find one. I do find several on the open web when I google for them. I down loaded a couple of them but when I try to install them my OS warns me they are not from approved sources and it's best not to trust them. They are probably okay but I decided to avoid any possible problems for now as long as the computer is working just fine.

----Ron

Link to comment
Share on other sites

Ron,

Excerpt:

The SMART drive health data is installed on all hard drives today and all Windows computers as well as Macs have utilities built in to get that data to assess drive health. You have drive utilities on your Mac to access it without downloading a thing.

 

"Checking Hard Drive Health

The first thing you’ll want to do is check the hard drive health, this is done with a process called verification, and it’s quite simple:

 

1. Launch Disk Utility, found within the /Applications/Utilities folder

 

2. Select the Mac hard drive from the left side menu and click on the “First Aid” tab

 

3. Click on “Verify Disk” in the lower right corner and let it run
You will find the window populating with messages about the drives health, messages that indicate things are fine appear in black, messages that indicate something is wrong appears in red. Disk Utility should resemble something like the following screen shot:

 

Go here for the full article from OSX Daily: http://osxdaily.com/2012/05/24/check-hard-drive-health-mac-disk-utility/

 

And this if you don't mind command line control.: http://hints.macworld.com/article.php?story=20031122041138373

Link to comment
Share on other sites

Ron,

Excerpt:

The SMART drive health data is installed on all hard drives today and all Windows computers as well as Macs have utilities built in to get that data to assess drive health. You have drive utilities on your Mac to access it without downloading a thing.

 

"Checking Hard Drive Health

The first thing you’ll want to do is check the hard drive health, this is done with a process called verification, and it’s quite simple:

 

1. Launch Disk Utility, found within the /Applications/Utilities folder

 

2. Select the Mac hard drive from the left side menu and click on the “First Aid” tab

 

3. Click on “Verify Disk” in the lower right corner and let it run

You will find the window populating with messages about the drives health, messages that indicate things are fine appear in black, messages that indicate something is wrong appears in red. Disk Utility should resemble something like the following screen shot:

 

Go here for the full article from OSX Daily: http://osxdaily.com/2012/05/24/check-hard-drive-health-mac-disk-utility/

 

And this if you don't mind command line control.: http://hints.macworld.com/article.php?story=20031122041138373

 

Thanks for the info RV. Regarding that first link in your message; I ran across that article while looking for a utility yesterday. I tried using that command line using the Terminal App but did not work - it just did not execute anything or provide any response. I suspect that's because it's a very old article and probably worked with an earlier version of OSX but I can't get it to work with OSX 10.

 

Regarding the second link; that article is from 2012. The version of the Disk Utility furnished with OSX 10 is a little different but works about the same. the current version just reports that the SMART status is "verified" but doesn't allow any visibility into the details of the report.

Link to comment
Share on other sites

Thanks again RV. I don't know why I didn't find this "SMARTReporter" app when I searched the Apple store. I tried the app and it works with my OSX 10.11. The reports produced by the app are very basic - pretty much just a pass/fail report. A little more research tells me that drive manufacturers implement SMART many different ways and provide visibility of varying degrees of detail (called attributes). The app can only report what the drive makes available. So, my conclusion at this point is that the app works fine but the Apple SSD SM512E is probably not testing-for, or reporting-out, any of the detailed "attributes" that would be interesting know.

 

The test runs periodically and, other than the date & time every report I've seen has been exactly the same:

 

Mar 6 08:57:22 SMARTReporter[749] <Info>: Drive: 'APPLE SSD SM512E ( | S118NYACA05126 | disk0)' Status: SMARTOK (S.M.A.R.T. condition not exceeded, drive OK)

 

If anyone has a different understanding about this please chime in.

 

ron

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

Guest
This topic is now closed to further replies.
RVers Online University

campgroundviews.com

Our program provides accurate individual wheel weights for your RV, toad, and tow vehicle, and will help you trim the pounds if you need to.

Dish For My RV.

RV Cable Grip

RV Cable Grip

All the water you need...No matter where you go

Country Thunder Iowa

Nomad Internet

Rv Share

RV Air.

Find out more or sign up for Escapees RV'ers Bootcamp.

Advertise your product or service here.

The Rvers- Now Streaming

RVTravel.com Logo



×
×
  • Create New...