On The Insider: Britney's Bikini-Clad Top 10
BNET Business Network:
BNET
TechRepublic
ZDNet

August 8th, 2007

Realtek silent data corruption caused by firmware

Posted by George Ou @ 4:48 am

Categories: Desktop, Hardware, Networking, Servers

Tags: Data Corruption, Firmware, George Ou

A little more than a week ago, I took Realtek and Microsoft to task over some Realtek WHQL certified drivers causing silent data corruption.  As it turns out, the problem was actually caused by a Firmware flaw in Realtek’s hardware.  This means the problem would have affected any operating system regardless of the driver if it used the large send and checksum offload hardware feature of the gigabit network adapter from Realtek.  The reason I thought it was a driver problem was because upgrading the driver fixed the problem, but it turns out the driver actually updated the firmware in the on-board network adapter.  Here’s what a Microsoft Engineer had to say: 

I was wondering if you had a moment to talk about the article you wrote about the Realtek silent data corruption issue and WHQL testing. I have been heading up this investigation on our end and have some interesting reasons as to why this issue was missed. On the surface, it seems that any basic test should have been able to catch this, and that it should have never passed any sort of testing. However, this bug only manifests itself under a pretty unforeseeable circumstance. First, checksum offload and large send offload must both be enabled in the driver. Checksum offload and large send offload are things we do test, but what makes the circumstances a little more strange are that a small packet, 58 bytes or less, must be sent before a very large packet. These are things we also test for, small packets and large packets. The problem in testing comes in that we don’t test all of these together at one time. Now, unfortunately in a real life situation this can happen pretty easily if let’s say you are running BitTorrent client while trying to transfer files on your local LAN with both offloads enabled. Now, the tricky part in testing is that we have hundreds of individual tests we run against each driver before it gets certified, and there is a nearly infinite combination of tests that we could run in combination. We are working on that however, a way to run these tests in many combinations.

Now, you are probably thinking why don’t we have just a simple test where you transfer some files over background traffic. Well, considering the level of testing we already do, this kind of a test is unlikely to find any other bug except this one. However, we are discussing adding such a test just to be safe. Secondly, the tests need to be deterministic. Each of our tests checks for something in particular: checksum offload, large packets, small packets, etc. If a test fails like moving files over background traffic it does not necessarily tell us what is wrong with the driver and thus takes a long time to debug the point of failure. In a case such as silent data corruption, it would be obvious what was wrong, but often times it will not be so obvious. Again though, we are still evaluating these approaches.

At any rate, as always, I would value your opinion on the matter now that you have a more complete story. We are taking this problem very seriously and of course are trying to improve the testing where we can.

The main responsibility for testing this goes to Realtek and I certainly hope they’ve learned a valuable lesson here and avoid these kind of bugs with more rigorous testing.  The secondary responsibility goes in to the OS vendor to validate the hardware though I have to admit that this particular problem was a difficult one to discover for both Realtek and Microsoft.  I understand that adding another test procedure makes the testing process a little more time consuming, but anything’s better than forcing the customer to do the beta testing of hardware and firmware.  I’m happy to hear that Microsoft and Realtek are working together to figure out these problems and I hope not to see this kind of bug again.

George Ou is Technical Director of ZDNet. See his full profile and disclosure of his industry affiliations.

  • Talkback
  • Most Recent of 23 Talkback(s)
Just encountered this problem with vista/ubuntu
I've just got a new laptop with this problem under both vist and linux. I downloaded the new driver but the problem remains.

How would you tell if the firmware has been upgraded?... (Read the rest)
Posted by: dh@... Posted on: 08/21/07 You are currently: a Guest | | Terms of Use
Well George...  bportlock | 08/08/07
Well it's good to hear them address this directly  georgeou | 08/08/07
that's BS  Linux Geek | 08/08/07
Or they dont use the cards firmware for these functions !!!  mrlinux | 08/08/07
Shut up, fool.  James T. Kirk | 08/08/07
How do you know if Linux even uses offload function?  georgeou | 08/08/07
You need not respond  John Zern | 08/08/07
Read global usage lately?  tonymcs@... | 08/08/07
Kudos to Realtek and Microsoft ...  kd5auq | 08/08/07
It's funny  crashOverburn | 08/08/07
I've had MS DHCP servers stay operational for years  georgeou | 08/08/07
Same here  Uber Dweeb | 08/08/07
Don't tell my DHCP server!  John Zern | 08/08/07
Oops mine too  tonymcs@... | 08/08/07
Only Issue I have ever had ....  JoseTorr | 08/09/07
This is almost humorous...  Cardinal_Bill | 08/08/07
No, it was always broken  georgeou | 08/08/07
Sigh...  Cardinal_Bill | 08/08/07
I tempered my criticism because ...  georgeou | 08/08/07
Driver update effect on dualboot systems  bksgs1 | 08/08/07
You mean the updated Windows drivers broke Linux and BSD?  georgeou | 08/09/07
Another fix is to replace the adapter!  randysmith@... | 08/17/07
Just encountered this problem with vista/ubuntu  dh@... | 08/21/07

What do you think?

SponsoredWhite Papers, Webcasts, and Downloads

Click Here
advertisement

Recent Entries

Top Rated

    Premier Vendor Content Whitepapers, webcasts & resources from our Power Center Sponsors

    Archives

    ZDNet Blogs

    White Papers, Webcasts, and Downloads

    SmartPlanet

    Click Here