On CHOW: Easy Thanksgiving for beginners
BNET Business Network:
BNET
TechRepublic
ZDNet

August 7th, 2007

50 ways to lose your data

Posted by Robin Harris @ 12:45 pm

Categories: Disk drives, RAID

Tags: Disk, Data, Robin Harris

Apologies to Paul Simon
Disk drives are marvelous devices. Especially when they go “clunk” and stop working. I’m not kidding: at least you know your data is hosed. I prefer that to the silent data corruption you don’t find out about until you can’t access a file or your OS starts freezing. Or a RAID rebuild fails.

Silent data corruption is common
You just don’t know it. Many low-end RAID controllers don’t report problems, figuring you’ll never notice. If you do notice, months later, what is the chance that you’ll know it was the controller’s fault?

Back up is better than insurance
Insurance is designed to protect you against damaging but uncommon events. But data loss is very common. Backup isn’t insurance. It is simple digital hygiene. You’ll use it again and again.

What are disks made of?
Hard drives sit at the bottom of a stack of hardware and software that usually gets your data from your CPU to the disk and back. But there are a lot of places where things can go wrong.

Here’s a partial list:
Media: those beautifully plated silver disks are subject to a couple of major problems:

  • Flipped bits: when a read-only track sits next to frequently written track, the extraneous magnetic field from the writes weakens the magnetization of the read-only bits until your disk can’t read it. Normally disk ECC corrects these errors, but not always.

    This is why disk fanatics periodically zero-out their disks and reload all their data. I’m not recommending this, just noting the practice.

  • Physical problems, like a piece of dust, can scratch the disk and/or create enough heat so the head stops reading momentarily. Depending on severity the disk may remove that block from use or begin a death spiral into oblivion.

Wear out: disks have a lot of moving parts. In a 7200 RPM drive the disks are spinning 120 times per second compared to the 500 RPM of a CD drive. After a few years the motor can start to go. It may become slightly erratic, so some bits get squeezed and others get smeared.

The arm that moves the heads may can move dozens of times per second. When the bearings get loose it can go off track and corrupt data on adjacent tracks.

Electrical: if the drive power supply fails your drive will shut down. But if it is slowly degrading it can create extra heat or power surges that affect already marginal components. Component failures leading to sudden death are not seen by SMART reporting, which is one reason why SMART isn’t much use.

Software: drives contain small computers that run on several hundred thousand lines of code. Is that code bug free? Need you ask? Among the more common bugs - and let’s not get started on the less common ones - are:

  • New code that fixes a problem and accidently breaks old code
  • Putting the right data in the wrong place.
  • Phantom writes that are reported as written but, oops!, aren’t.
  • Cache management bugs that munge data, or return correct data to the wrong place.
  • OK, this is less common, but sometimes the on-disk ECC miscorrects the data. ECC is software, right? How do you know it always works correctly? You don’t.

Bus controllers: whether managing IDE, ATAPI, SATA, SSA or FC, controllers are small computers running code. Bugs in controller code have corrupted data in the past and will no doubt do so again.

RAID controllers: again, small computers running code subject to bugs, as well as all manner of electrical, connector and cable problems. One insidious problem is corruption of RAID 5 parity data. It is pretty simple to check a file by reading it and matching the metadata. Checking parity data is much more difficult, so you typically won’t see parity errors until a rebuild. Then, of course, it is too late.

The Storage Bits take
While this list is admittedly incomplete - and less than 50 if you’re counting- I’m hoping it will help readers understand why backing up your data is worth the time and money. Modern data storage is a miracle of mass-produced high-technology, but it isn’t perfect. Disks will fail. Power will surge. Bugs will surface. You can’t avoid them.

What you can avoid is losing your data. If you don’t already have a cheap external USB drive, go buy one and at least store your documents and email on it. You won’t regret it.

Next: some more way our systems lose data and what vendors can do - and I know at least one of them is doing - to protect our data from silent data corruption.

Comments welcome, of course. As I was writing this a friend called me in a panic saying “I think my hard drive is going out!”

“Good thing you have it backed up” I said. Of course, he didn’t. He’s out buying a USB drive this very minute.

Robin HarrisRobin Harris has been messing with computers for over 30 years and selling and marketing data storage for over 20 in companies large and small. See his full profile and disclosure of his industry affiliations.


Email Robin Harris

Subscribe to Storage Bits via Email alerts or RSS.

  • Talkback
  • Most Recent of 10 Talkback(s)
RE: 50 ways to lose your data
I totally agree with using backups especially online backups. For me Safecopy backup, www.safecopybackup.com, is a perfect fit. I can backup all my files from both my Mac and Pc with just only one acc... (Read the rest)
Posted by: dobi2009 Posted on: 05/13/09 You are currently: a Guest | | Terms of Use
50 ways to lose your data  simplifried | 08/07/07
Thank you!  R HarrisZDNet Moderator | 08/08/07
The drive failures you are talking about  Linux User 147560 | 08/07/07
I Agree  seal@... | 08/08/07
Just pulled...  Dr. John | 08/09/07
Power supplies, too  ianbatty | 08/09/07
Cooling helps too...  57ford | 09/06/07
PLEASE clarify  bmgoodman | 08/09/07
ZFS file system helps  cjc5447 | 09/18/07
RE: 50 ways to lose your data  dobi2009 | 05/13/09

What do you think?

SponsoredWhite Papers, Webcasts, and Downloads

Click Here
advertisement

Recent Entries

advertisement

Archives

Favorite Links

ZDNet Blogs

White Papers, Webcasts, and Downloads

Meet Doc

  • Here to help you with your Document Management Needs
  • Doc is an enigma. Born to a Russian ballerina and a German electrical engineer, he grew up in various locations in the United States. He’s seen the insides of more brands, versions, and generations of printer and printer-related hardware than almost anyone.
  • To learn more about this mysterious figure check out his blog on ZDNet and his Workspace on TechRepublic. You’ll be glad you did.
  • Produced by
    ZDNet and