On TechRepublic: Five super-secret features in Windows 7
BNET Business Network:
BNET
TechRepublic
ZDNet

August 8th, 2007

How Microsoft puts your data at risk

Posted by Robin Harris @ 4:17 pm

Categories: Disk drives, Infrastructure, Software

Tags: NTFS, File System, Microsoft Corp., Robin Harris

56% of data loss due to system & hardware problems - Ontrack
Data loss is painful and all too common. Why? Because your file system stinks. Microsoft’s NTFS (used in XP & Vista) with its de facto monopoly is the worst offender. But Apple and Linux aren’t any better.

Everyone knows what the problems are AND high-end systems fixed many of them years ago. Yet only one desktop vendor is moving forward, and they aren’t based in Redmond. Here’s the scoop.

Y2k got fixed. File systems didn’t.
That may sound harsh. But with all the lip-service paid to innovation - especially in Redmond - you’d think that sometimes we’d see some, especially in core technology. After all, more than half of all data loss is caused by system and hardware problems that the file system could recover from - but doesn’t.

Instead we’re using 20 year old technology that, like the 2 digit year - which led to the Y2K drama - was designed for a world of scarce storage, small disks and limited CPU power. Unlike Y2K though, we are living with, and paying for, these compromises every day with lost data, corrupted files, lame RAID solutions and hinky backup products that seem to fail almost as often as they work.

File systems? I should care because . . .
You rely on your file system every time you save or retrieve a document. It is the file system that keeps track of all the information on your computer. If the file system barfs, your data is the victim. And you get to pick up the pieces.

As documented in my last two posts (see How data gets lost and 50 ways to lose your data) PC and commodity server storage stacks are prone to data corruption and loss, many of them silent. Only your file system is positioned to see and fix these problems. It doesn’t, of course, but it could.

And you enterprise data center folks, smirking over the junk consumers get, don’t be too smug. Some of your costly high-end storage servers have NTFS or Linux FS’s under the hood as well. And no, RAID doesn’t fix these problems. According to Kroll Ontrack, only a quarter of data loss instances are due to human error - and many of those errors happen in the panic after a loss is discovered.

Hey, I thought machines were supposed to be good at keeping track of stuff? Only if they are built to.

IRON = Internal RObustNess
I came across the fascinating PhD thesis of Vijayan Prabhakaran, IRON File Systems which analyzes how five commodity journaling file systems - NTFS, ext3, ReiserFS, JFS and XFS - handle storage problems.

In a nutshell he found that the all the file systems have

. . . failure policies that are often inconsistent, sometimes buggy, and generally inadequate in their ability to recover from partial disk failures.

Dr. Prabhakaran will see you now
In a mere 155 pages of lucid prose he lays out his analysis of the interaction between hosts and local file systems. It is a clever analysis, especially of the proprietary and unpublished NTFS.

First, inject a lot of errors
Dr. Prabhakaran built an error-injection framework that enabled him to control what kind of errors the file system would see so he could document how the FS handled them. These errors include:

  • Failure type: read or write? If read: latent sector fault or block corruption. Does the machine crash before or after certain block failures”
  • Block type: directory block; super block? Specific inode or block numbers could be specified as well.
  • Transient or permanent fault?

So how did NTFS fare?
Since NTFS is proprietary, Dr. Prabhakaran couldn’t get as deeply into it as the open-source systems. While NTFS doesn’t implement the strongest form of journaling, he found it pretty reliable at letting applications know when an I/O error has occurred. NTFS also retries I/O requests more than the Linux file systems, which, compared to the dearth of retries on Linux, is a good thing.

NTFS sanity checking is also stronger than some. Yet he notes that

NTFS surprisingly does not always perform sanity checking; for example, a corrupted block pointer can point to important system structures and hence corrupt them when the block pointed to is updated.

Translation: Bad Thing.

General screw-ups
Dr. Prabhakaran offered a set of general conclusions about the commodity file systems including NTFS:

  • “Detection and Recovery: Bugs are common. We also found numerous bugs across the file systems we tested, some of which are serious, and many of which are not found by other sophisticated techniques.”
  • “Detection: Sanity checking is of limited utility. Many of the file systems use sanity checking . . . . However, modern disk failure modes such as misdirected and phantom writes lead to cases where . . . [a] bad block thus passes sanity checks, is used, and can corrupt the file system. Indeed, all file systems we tested exhibit this behavior.”
  • “Recovery: Automatic repair is rare. Automatic repair is used rarely by the file systems; . . . most of the file systems require manual intervention . . . (i.e., running fsck).”
  • “Detection and Recovery: Redundancy is not used. . . . [P]erhaps most importantly, while virtually all file systems include some machinery to detect disk failures, none of them apply redundancy to enable recovery from such failures.”

Dr. Prabhakaran found that ALL the file systems shared

. . . ad hoc failure handling and a great deal of illogical inconsistency in failure policy . . . such inconsistency leads to substantially different detection and recovery strategies under similar fault scenarios, resulting in unpredictable and often undesirable fault-handling strategies.
. . .
We observe little tolerance to transient failures; . . . . none of the file systems can recover from partial disk failures, due to a lack of in-disk redundancy.

How doomed are we?
Pretty doomed. But there is some hope.

There are well known techniques, such as disk scrubbing, check summing, and more robust ECC used in high-end systems that could be added to our systems. Not rocket science.

Young Dr. Prabhakaran now works at Microsoft Research. Perhaps someone up in Redmond will reach out to him to see how NTFS’s aging architecture might be enhanced.

Of course, Microsoft is fine with the status quo until it threatens market share. Internet Explorer’s innovation hiatus after crushing Netscape is a fine example.

So it is good news that Apple has two storage initiatives that will put pressure on Redmond to clean up its act.

  • Time Machine is a beautifully crafted automatic backup utility in Mac OS X.V (Leopard). While it doesn’t solve the data corruption problems that I assume HFS+ has as well, it does make it very easy for regular folks to backup and recover their data. I think small business types will love it.
  • ZFS is the new open-source file system from Sun that Apple is incorporating into OS X. I expect the port won’t be complete for another year, but ZFS is the first file system to offer end-to-end data integrity that can detect and correct such devious problems as phantom writes.

See Apple’s new kick-butt file system for more on ZFS.

The Storage Bits take
As noted in “How data gets lost” more than half of all data loss is caused by system and hardware problems. A high quality file system that took better care of our data could eliminate many of those failures.

The industry knows how to fix the problems. The question is when. With a resurgent Mac pushing ZFS maybe Redmond will see the light sooner, rather than later, and dramatically increase the reliability of all our systems.

It will be interesting to see how Microsofties spin inferior data integrity once ZFS is the OS X default file system. Especially to the enterprise folks for whom data integrity is the ne plus ultra of the data center.

Comments welcome, of course. Itching to read a well done CompSci PhD. thesis? Here’s a link to IRON File Systems. Enjoy.

Update: based on the first couple of commenters, who seem to believe that data loss is a figment of my imagination, I gave more prominence to the factual basis of data loss and added a couple of short quotes from the thesis. I single out Microsoft because their negligence impacts more people than any other company. Maybe, someday, Microsoft will start measuring success in terms of software quality instead of market share.

Robin HarrisRobin Harris has been messing with computers for over 30 years and selling and marketing data storage for over 20 in companies large and small. See his full profile and disclosure of his industry affiliations.


Email Robin Harris

Subscribe to Storage Bits via Email alerts or RSS.

Related Discussions on TechRepublic

Did you know you can take part in these discussions with your ZDNet membership?

  • Talkback
  • Most Recent of 224 Talkback(s)
Was the 20 years reference intentional? (Patents)
20+ yr old technology is guaranteed to be free from crazy patent lawsuits. Sadly this holds everything back.... (Read the rest)
Posted by: alecco Posted on: 08/30/09 You are currently: a Guest | | Terms of Use
Yawn  tonymcs@... | 08/08/07
OK, 56% of data loss due to system & hardware problems  R HarrisZDNet Moderator | 08/08/07
Competition????  peterbk@... | 08/09/07
hmmm. Did you see this TITLE!!  xuniL_z | 08/09/07
Not entirely true  mds_z | 08/10/07
Is this your only exposure ...  xuniL_z | 08/10/07
My only wish . . .  chuckgosh@... | 08/14/07
Gee...  jasonp@... | 08/09/07
It doesn't out perform ReiserFS 4  Linux User 147560 | 08/09/07
RE: ... ReiserFS 4  GreyGeek | 08/09/07
APC or UPS in general?  shraven | 08/10/07
Haven't used that fs  melekali | 08/09/07
Good theory, stick with it  intrepi@... | 08/09/07
Too late...  Linux User 147560 | 08/09/07
Rebadging  hmcm@... | 08/09/07
Yes, MS does that all the time.  Jxn | 08/15/07
Real issues  mds_z | 08/10/07
BS -Mr Harris  croberts | 08/08/07
Go back and read the article  R HarrisZDNet Moderator | 08/08/07
Are you serious?  croberts | 08/08/07
Number of files  croberts | 08/08/07
You have totally missed the point of the article.  bmerc | 08/09/07
theory & reality  setekh1984 | 08/09/07
And once again, someone DID NOT READ THE PAPER  bmerc | 08/09/07
"Just Backup" = over-simplification  cquirke | 08/10/07
Yes, I'm serious  cquirke | 08/10/07
90 Gig is a fraction of 12 TB  R HarrisZDNet Moderator | 08/15/07
It's obvious you do need a new headline  georgeou | 08/08/07
How Conventional File Systems Put Your Data at Risk ...  George Mitchell | 08/08/07
See, if you can do it.....  croberts | 08/08/07
Headlines are always inflamatory  MacGeek2121 | 08/09/07
Much Better...  fr0thy2. | 08/09/07
Microsoft's 90% market share means they put more data at risk  R HarrisZDNet Moderator | 08/08/07
So wrong  NonZealot | 08/09/07
Exactly  georgeou | 08/09/07
So you are discounting  Linux User 147560 | 08/09/07
No  NonZealot | 08/09/07
No 100% of my files / data  Linux User 147560 | 08/09/07
So Wronger  RealNonZealot | 08/09/07
I hope you don't count yourself as rational...  fr0thy2. | 08/09/07
You forgot Loverock.  nix_hed | 08/09/07
Pathetic  xuniL_z | 08/13/07
Well put...that was really good/creative.  fr0thy2. | 08/09/07
Come on people, don't be stupid.  bmerc | 08/09/07
Hack  shraven | 08/10/07
Wrong  waterhzrd | 08/10/07
George, I agree.  Grayson Peddie | 08/09/07
Stink should be reserved for when it's deserved  georgeou | 08/09/07
Microsoft gets singled out  maldain | 08/09/07
Microsoft may have  alaniane@... | 08/09/07
Mr(?) alaniane@... 08/09/07  Jxn | 08/15/07
I thought about using "sucks"  R HarrisZDNet Moderator | 08/15/07
Re Headline  gtdavies33@... | 08/09/07
That was my first thought  tomntmj | 08/09/07
LOL, George  RealNonZealot | 08/09/07
Amen...  wmlundine | 08/09/07
Bit Pot Kettle Black There Mr. Ou  TheBoyBailey | 08/10/07
Go George  shraven | 08/10/07
What you have done for sure  xuniL_z | 08/13/07
Right on the mark! The time is now for built in redundancy ...  George Mitchell | 08/08/07
Easy for YOU to say ...  Jambalaya Breath | 08/08/07
Basically agree and  R HarrisZDNet Moderator | 08/08/07
Re: Easy for YOU to say ...  George Mitchell | 08/09/07
How bout this?  JCitizen | 08/09/07
I think that is a brilliant idea, seriously ...  George Mitchell | 08/09/07
What about Zero File System  derek.gascon@... | 02/04/09
Silly....MS at fault for HD errors...??  rock06r | 08/08/07
OK,  Cardinal_Bill | 08/08/07
Right On!  Uncle Buck | 08/09/07
Can feel u man  phemmywales@... | 08/10/07
Real world vs. vendor-vision  cquirke | 08/10/07
You didn't read it right.  rock06r | 08/08/07
Don't believe it  co-eddy | 08/08/07
Oh and...  co-eddy | 08/08/07
Call Steve Ballmer? Buy a Mac? Start backing up?  R HarrisZDNet Moderator | 08/09/07
I didn't know I was LOOSING data...  nix_hed | 08/09/07
I would not say it is inevitable  alaniane@... | 08/09/07
Huh, WinFS is still planned.  No_Ax_to_Grind | 08/08/07
The WinFS "vision" is still alive (supposedly), but the deliverable is dead  PB_z | 08/08/07
WinFS would fix exactly *none* of these problems  R HarrisZDNet Moderator | 08/08/07
Given that it is 3 years late....  B.O.F.H. | 08/09/07
Yeah will see it some time next decade  DarthRidiculous | 08/09/07
No Ax, will you stop?  RealNonZealot | 08/09/07
Will you?  fr0thy2. | 08/09/07
Tell me why...  RealNonZealot | 08/10/07
Blaming the software for blown main bearings on the platter motor.  osreinstall | 08/08/07
You aren't a systems guy, are you?  R HarrisZDNet Moderator | 08/08/07
Disagree  CobraA1 | 08/09/07
Amen, brother  wolf_z | 08/09/07
As an application developer  alaniane@... | 08/09/07
Yes, I disagree too...  Jxn | 08/15/07
It still wants a slice of the CPU. Put that stuff onto a controller chip.  CobraA1 | 08/26/07
Yes I am. I retrieve data all the time.  osreinstall | 08/09/07
Hmm...  Jxn | 08/15/07
Yes but not what I was getting at.  osreinstall | 08/15/07
*Conventional Wisdom*  George Mitchell | 08/09/07
Is usually the best and is called Common Sense.  osreinstall | 08/09/07
Nonsense ...  George Mitchell | 08/09/07
It isn't nonsense.  osreinstall | 08/09/07
If you have only one drive ...  George Mitchell | 08/09/07
But I have a server for backups then DVDs  osreinstall | 08/09/07
Inexperience  hmcm@... | 08/10/07
Yeah sure.  osreinstall | 08/10/07
Inexperience x2  hmcm@... | 08/10/07
Yeah Sure x 2  osreinstall | 08/10/07
Inexperience x3  hmcm@... | 09/01/07
I am not sure  alaniane@... | 08/09/07
Complexity?  George Mitchell | 08/09/07
If it was built  alaniane@... | 08/10/07
Software is reliant on hardware  alaniane@... | 08/09/07
Multiple Cores and Multiple CPUs are available now ...  George Mitchell | 08/09/07
Multi-core will  alaniane@... | 08/10/07
gotta love that core 2 duo  rebelxhardcore | 08/12/07
What do you think about Windows Home Server  MrOtter | 08/08/07
I don't have an opinion yet, and  R HarrisZDNet Moderator | 08/08/07
WHS  awraynor | 08/09/07
WHS  Jxn | 08/15/07
This guy sounds confused  Imaginos1892 | 08/09/07
More ways your blog is factually incorrect  NonZealot | 08/09/07
Pathetic Windows fanboys, unite!  RealNonZealot | 08/09/07
WinFS uses NTFS for phisycal layer  qmlscycrajg | 08/09/07
The author is a total moron  jackbond | 08/09/07
Interesting. And here's some more interesting stuff to consider . . .  CobraA1 | 08/09/07
Murphy's Law, my friend.  R HarrisZDNet Moderator | 08/09/07
re: Murphy's Law, my friend.  CobraA1 | 08/09/07
Corruption isn't just about HDs  cquirke | 08/10/07
There have been some great filesystems out there for decades.  terry flores | 08/09/07
In 15 years, I haven't had any loss...  bjbrock | 08/09/07
AMEN!  fr0thy2. | 08/09/07
Piece of junk  frgough | 08/09/07
Not Microsoft's fault, it's your fault.  eternal-cynic | 08/09/07
Hey dudes  Jxn | 08/15/07
Why does this single out Microsoft when everyone is guilty?  ye | 08/09/07
Sun doesn't make desktop machines  RealNonZealot | 08/09/07
And you know this because...you work there?  fr0thy2. | 08/09/07
Better let them know that. . .  bkinsey@... | 08/10/07
They're clearly workstations  RealNonZealot | 08/10/07
Yet another "biased" unblinking eye on MS!  andrej770 | 08/09/07
Since the majority of computers are Wintel  Boot_Agnostic | 08/09/07
Disk mirroring solution  Rick_R | 08/09/07
I've experience a partial disk failure and...  Rokstar83 | 08/09/07
Huh?  fillinger_charles@... | 08/09/07
The sky is falling!  SteveMak | 08/09/07
On the head of some..  green alien | 08/09/07
The author  alaniane@... | 08/10/07
Finally, a MS slam  pvhastings | 08/09/07
Finally, a MS slam Finally man you don't read many articles on ZDnet  SO.CAL Guy | 08/09/07
This article is a data loss  crypt2121 | 08/09/07
This article is a data-loss  CodeBubba | 08/09/07
I agree never a problem (unless the dumb-ass)  fr0thy2. | 08/09/07
Anyone? Anyone at all?  bmerc | 08/09/07
NTFS n Y2K  unclefred@... | 08/09/07
Surprise!!!  Jeremy W | 08/09/07
what to make of you?  shraven | 08/09/07
How about something useful instead...  synapsevampire@... | 08/09/07
We fixed this years ago!  wwwsupport | 08/09/07
Does SMART and NTFS paper over the void?  cquirke | 08/10/07
I am amazed at how many of you miss the point!  gerhart | 08/09/07
Good Point - Last BTW...  fr0thy2. | 08/09/07
Good Point - Last night BTW...  fr0thy2. | 08/09/07
Evolution - Robin, would you comment on this?  Mitch 74 | 08/09/07
File System Failure  ceh4702 | 08/09/07
Novell had it right many years ago.  jamesm@... | 08/09/07
More thinking - Why Mirroing is good but you need OS protections too  jamesm@... | 08/09/07
Some facts that people missed.  zoel_ii@... | 08/09/07
whatever  CobraA1 | 08/09/07
Windows becoming unstable is the most common cause of data loss.  johnhe@... | 08/09/07
Been using WIndows since 3.11...never lost  fr0thy2. | 08/09/07
Well gee, then it must never happen...  jasonp@... | 08/10/07
Amazing!  hmcm@... | 08/10/07
They are all error-prone  Vladilyich | 08/09/07
Software RAID  erikmidtskogen | 08/09/07
Huh? You have seriously confused me.....  computerworkspro | 08/09/07
Sitting the Crack Pipe Down...  Kromaethius | 08/09/07
No, he makes a very valid point.  UbiquitousGeek | 08/09/07
No, You Make a Point, But Your Hat's Covering It!  Kromaethius | 08/10/07
Ride on  strangefruit | 08/09/07
Reliable File Systems  kreid@... | 08/09/07
Yeah - Novell got out Marketed...  fr0thy2. | 08/09/07
Novell file system is tops  peterharding@... | 08/09/07
This is al-Qaeda virus hacker  BALTHOR | 08/09/07
HFS+ with journaling seems best right now and Apple has it  ralphrides | 08/09/07
This is rediculous!!! If your so smart come up with something better.  kevin.minshew@... | 08/09/07
and WinFS does what??  techsupport@... | 08/09/07
Assume much lately?  Narg | 08/09/07
All those saying "good enough for today" are responsible for junk products  reqadr@... | 08/09/07
if you buy this story...  inertman@... | 08/09/07
...and????  SirCatlord | 08/09/07
Wow!  Qbt | 08/09/07
He is right  riggy001@... | 08/09/07
You can ghost  alaniane@... | 08/09/07
Improvements are Mandatory  melekali | 08/09/07
How Microsoft puts your data at risk  windozefreak | 08/09/07
ZFS a GREAT FS? Sorry it fails too!  bbachman | 08/09/07
How is this a ZFS problem?  bmerc | 08/09/07
Amen  hmcm@... | 08/09/07
Well Yes and No  knudson | 08/09/07
Berkeley fast file system is still my choice.  Resuna | 08/09/07
why dont you just give up?  mflemming@... | 08/09/07
Wow  hmcm@... | 08/09/07
Great article. Too bad so many people didn't bother to read ... it  hmcm@... | 08/09/07
Let me try again... Just getting used to how the ZDNet blog s/w works.  hmcm@... | 08/10/07
Reliability???  setekh1984 | 08/09/07
HEADLINE: ZDNet hires another clueless retard blogger...  Scrat | 08/10/07
Sharpening the Axe...  Kromaethius | 08/10/07
Good One!!  SushantMadhab | 08/10/07
Eyes on the future  green alien | 08/10/07
Hope sooner than later  lteddybear@... | 08/10/07
No NTFS Testing is Disappointing  svansickle@... | 08/10/07
NTFS details are scarce  cquirke | 08/10/07
what about CDFS or UDO?  svalery | 08/10/07
Scaling down: Survivability vs. Security  cquirke | 08/10/07
measuring success in terms of software quality instead of market share??  InvestInSolutions | 08/10/07
Good point.  R HarrisZDNet Moderator | 08/15/07
It would have been nice  Update victim | 08/11/07
I use FAT32  rebelxhardcore | 08/12/07
ZDNET= FOX NEWS  DonBurnett | 08/13/07
Wrong title  hdn.de | 08/14/07
Well could that Advertising will ork out for these 'Fine Younge Lerners".?  RobeTirm@... | 08/15/07
Hope LINUX Developers Listen  dschmutz@... | 08/15/07
OS-based on ZFS that would run on an intel/amd pc  null | 08/23/07
I've read the thesis.  Mitch 74 | 08/30/07
RE: How Microsoft puts your data at risk  Vquest55@... | 05/30/08
Was the 20 years reference intentional? (Patents)  alecco | 08/30/09

What do you think?

SponsoredWhite Papers, Webcasts, and Downloads

Click Here
advertisement

Recent Entries

advertisement

Archives

Favorite Links

ZDNet Blogs

White Papers, Webcasts, and Downloads

SmartPlanet

Click Here