On UrbanBaby: Working Mother Confession
BNET Business Network:
BNET
TechRepublic
ZDNet

August 4th, 2009

Why writing a Windows compatible file server is (still) hard

Posted by Jeremy Allison @ 3:00 am

Categories: General

Tags: Packet, Network, Common Internet File System, Samba, Client, Microsoft Corp., File Server, File, POSIX, POSIX System

[The opinions expressed here are mine alone, and not those of Google, Inc. my current employer.]

I don’t often write about my day to day work, but sometimes I run across a problem that is so intransigent that it was a triumph when I finally fixed it. If you take an engineering job in the software industry, this is the kind of thing you might end up working on. If you find this column fun and interesting, then you might be a good candidate for a network engineer. Even if you don’t, I hope you’ll appreciate the insane level of detail network engineers have to know on your behalf, to make something as simple as “saving a file” work seamlessly across operating systems.

One of the remedies imposed on Microsoft after they lost the European Union workgroup-server antitrust case was the requirement to publish the full specifications for third-party software to interoperate with their operating systems. They are still in the process of doing this, but there are now thousands of pages of documentation out there, in theory fully specifying the Server Message Block/Common Internet File System (SMB/CIFS) protocol that Samba and Windows file servers implement. So surely anyone and their auntie (assuming your auntie is a network engineer :-) can now write their own SMB/CIFS server by just reading this copious documentation. After all, now that it’s all documented, how hard can it be ?

A bug I fixed this week illustrates why I still think Samba is the leading choice for interoperability between Windows and Linux/UNIX systems. It concerns a strange tale of Microsoft Office and the “Offline Files” remote synchronization feature. “Offline Files” in Microsoft Windows allows a user to save a version of a file they’re working on from a remote file system on their local laptop, and have it re-synchronized to a server when they get back online.

A user of Samba reported a bug that showed conclusively that trying to synchronize a Microsoft Office file against a Samba server wasn’t working. The Windows client “Sync Center” application kept telling the user that the file on the remote Samba disk had been changed since it was saved, and he knew this wasn’t the case.

It got stranger. It only happened with Vista, not with XP or Windows 2003. It only happened with Microsoft Office 2003 (all other versions of Office worked fine). It only reliably happened with Microsoft Excel, no other Microsoft Office application. Have I mentioned how much I hate Microsoft Excel ? I quake in fear whenever I see an Excel interoperability bug logged against Samba. That application is perverse in the things it will do to a remote file server.

I looked at my nice new shiny downloaded Microsoft documentation. There was nothing related to this problem in there. The document describing the precise behavior of an NTFS filesystem as seen over the wire from an SMB/CIFS server is yet to be finished. They’re still working on it. OK, so let me check what happens when you use this version of Excel to do the very same thing against Windows. Maybe it’s a real bug that fails against a Microsoft file server too; stranger things have been known. No, it worked fine against a Windows 2003 server, which to be honest did not surprise me. Microsoft tests the hell out of Microsoft Office before shipping any software that interacts with it in any way.

Time to get out the big guns. A debug log from Samba at our highest logging level, and a network packet capture trace (using the Open Source software “wireshark”) of when the problem was happening. Looking at the log didn’t show any obvious errors, other than the fact that Excel does an insane number of operations over the network to do something as simple as a “Save File” (if you’ve ever wondered why Excel is slow, look at what it does over a network). A brief glance at the network capture trace didn’t help either, everything looked fine except that on the save operation to the Samba server, Excel strangely decided to abort half way through.

This was getting more interesting. It seemed to be a generic failure of the “Save” operation, nothing to do with the “Sync” feature at all. So let’s test saving an Excel file against a Samba share without the “Sync” feature turned on in the client. Surely this must work, we also never ship a version of Samba without testing against Microsoft Office. Yes indeed, a normal save worked fine. So it was something to do with the “Sync” feature. But what could it be ?

The only thing to do was to do a second wireshark trace from the client to a Windows 2003 server, and then compare the two packet traces, the “bad” against the “good”, packet by packet.

Except of course it’s not that easy (nothing in Windows interoperability ever is :-). Due to the differences in response times between servers, slight differences in supported features, and of course the fact that the Samba architecture is completely different from that of the Windows CIFS server, the packet streams soon become very different. But after you’ve been doing this work for 17 years, you start to recognize the fingerprints of the broad actions that clients are trying to do, even with a protocol as chatty on the network as SMB/CIFS.

It took a couple of weeks of staring at the packet traces, on and off, but I eventually narrowed it down to a difference once Excel had written a temporary file out to the remote disk. Things started to be very different (and obviously wrong) at that exact point. So I started to look at the packets very closely.

The client was trying to set a “created” time stamp, to make the temporary file pretend to have been created at exactly the time as the original file. Now one of the interesting things in writing Samba is that is has to run on top of POSIX. A POSIX system is very different from Windows, so one of the challenges we have is to be able to emulate the different Windows features on top of standard POSIX.

A POSIX file system doesn’t have a “create” time stamp, so when we’re reporting back to Windows when a file was created, we have to look at all the available time stamps from the system, and just pick the earliest. This has always worked in the past, but maybe we’d finally run into a situation where we need that exact create time stamp as set by the client.

So I spent part of a day adding a temporary “created” time stamp into Samba, only held in memory. If this worked and fixed the bug I’d then find somewhere to store this on disk (probably in an “extended attribute”).

No, this still didn’t fix it. This was starting to make me very angry as it made no sense. I stared at the packet traces again. Even more closely. Then something jumped out at me.

The SMB/CIFS protocol has a feature where a client can be notified when a change is made on a remote file or directory. It’s called a “change notify”. Normally it’s used to allow a client to discover when another client is modifying the same file system (it’s the reason Windows “Explorer” windows spontaneously refresh with new files if a work colleague modifies the directory you’re looking at).  But even if a client modifies the file itself, the server still must send “change notify” packets to let the client know a file it has just requested to be modified has actually been modified. At the point in the packet stream, just after the create time stamp change was requested, the Windows server was sending a “change notify” packet, but the Samba server was sending the “change notify” after the file was written to instead. It was exactly the same packet, surely that couldn’t be the problem ?

I looked at our code. As POSIX can’t store a created time stamp, if the client requests it to be changed (and no other time stamps) we simply return a success code. But we weren’t sending a “change notify” back after this request, as technically we weren’t changing the time at this point. Instead we were sending it back after the file write, when we were changing the file. So I added code to send the “change notify” back after the time stamp change.

And the bug disappeared!

I went into one of my colleague’s office and kicked the hell out of one of the much loved Google beanbags, all the while screaming obscenities into the air for a good five minutes. He looked on with bemused amusement. I finally calmed down enough to explain the problem. One packet being returned at the wrong time. One single mis-timed packet caused a ripple effect in the Windows client file system software that was seen all the way up in the complex user interface of only that particular version of Excel, when interacting with the “Offline Files” feature, only on Windows Vista.

The remaining task was to add a regression test into our test suite, so that this specific bug is tested for before we release any new versions of Samba. The code isn’t done until it’s properly tested. But at least the user is now happy.

Interoperability with Windows is hard. But somebody has to do it. And if you’re going to do something, you might as well try and do it well (and try and have some fun at the same time :-) .

Stop the press. As I go to publish this, the user still occasionally reports the failure even with the patch, just not as often. Looks like there may be a secondary timing effect in play as well. Oh well, no one can say this job is dull.

Jeremy Allison is one of the lead developers on the Samba Team, a group of programmers developing an Open Source Windows compatible file and print server product for UNIX systems. Developed over the Internet in a distributed manner similar to the Linux system, Samba is used by all Linux distributions as well as many thousands of corporations worldwide. Jeremy handles the co-ordination of Samba development efforts and acts as a corporate liason to companies using the Samba code commercially. He works for Google, Inc. who fund him to work full-time on improving Samba and solving the problems of Windows and Linux interoperability.

  • Talkback
  • Most Recent of 87 Talkback(s)
Oh. And then...?
You buy 7500 seat licenses, or only 300? (Read the rest)
Posted by: Mitch 74 Posted on: 08/09/09 You are currently: a Guest | | Terms of Use
This article was more chatty...  bjbrock | 08/04/09
Agreed  GuidingLight | 08/04/09
the point...  mojorison67@... | 08/08/09
Switching to Windows is (still) the easiest solution  LBiege | 08/04/09
easier still, don't use ms office  stevey_d | 08/04/09
Give up a lot to gain a little?  John Zern | 08/04/09
how?  stevey_d | 08/04/09
NOT! Please list the things 90% of the population uses that OO doesn't have  No More Microsoft Software Ever! | 08/04/09
But our users would lose a ton of stuff - while an option  USTechHead | 08/04/09
Care to donate to a Windows Users fund?  No More Microsoft Software Ever! | 08/04/09
Easiest? Even if it was, it solves nothing.  Filker0_z | 08/04/09
Good job  CounterEthicsCommissioner-23034636492738337469105860790963 | 08/04/09
Lol  jdbukis@... | 08/04/09
ROTFLMAO!  No More Microsoft Software Ever! | 08/04/09
Feature  mojorison67@... | 08/08/09
Blame MS!  winux apple picker | 08/04/09
Easy solutuion  No_Ax_to_Grind | 08/04/09
@No_Ax_to_Grind  Axsimulate | 08/04/09
I understand and even agree  No_Ax_to_Grind | 08/04/09
Yeah, but they're not *your* coders so it's cheap happy.  JeremyAllison | 08/04/09
re coders for free  midcapwarrior@... | 08/04/09
No I don't work cheap happy.  JeremyAllison | 08/04/09
I agree in full  No_Ax_to_Grind | 08/04/09
doesn't make any sense  stevey_d | 08/04/09
I opened my hood  No_Ax_to_Grind | 08/04/09
Of course it's easy. You go the Microsoft way. Pay boatloads of money! (NT)  No More Microsoft Software Ever! | 08/04/09
RE: Why writing a Windows compatible file server is (still) hard  midcapwarrior@... | 08/04/09
Yes, SMB/CIFS combined with MS Office is a Rube Goldberg kind of thing you  DonnieBoy | 08/04/09
You do understand...  wolf_z | 08/04/09
Re; but it also wasn't following Windows protocol.  hkommedal | 08/05/09
Wow, that is really  GuidingLight | 08/04/09
Not so imaginary  colinnwn | 08/04/09
you're not an engineer, are you?  stevey_d | 08/04/09
Have you seen the code?  eb276 | 08/04/09
Or even better  nkahindo | 08/04/09
Now there's a plan! [nt]  zkiwi | 08/04/09
I'd look forward to that  UAC nanny screen | 08/04/09
Donniechild is good at that...nt  USTechHead | 08/04/09
Re; nothing in the Samba code that was at error.  hkommedal | 08/05/09
No wonder...  gtvr | 08/04/09
SAMBA Rocks!  SpikeyMike | 08/04/09
RE: Why writing a Windows compatible file server is (still) hard  Loverock Davidson | 08/04/09
Why even bother?  Mike Cox | 08/04/09
Terminated?  zkiwi | 08/04/09
Have you LOOKED at a Windows file server?  Mitch 74 | 08/04/09
Windows doesn't require per connection licences!  eqpc | 08/05/09
Oh. And then...?  Mitch 74 | 08/09/09
9.5 ! Mikey! We missed you! You got two fish today!  No More Microsoft Software Ever! | 08/04/09
Re; I terminated him on the spot.  hkommedal | 08/05/09
Saved me a ton of money  terry flores | 08/04/09
And that ton of money saved  Loverock Davidson | 08/04/09
Odd...  zkiwi | 08/04/09
Not odd at all  Loverock Davidson | 08/04/09
What is odd...  SimonUK2 | 08/04/09
You really do live in your own little RDF  zkiwi | 08/04/09
OK Loverock. I'll give you schematics (or code) on time/space manipulation.  No More Microsoft Software Ever! | 08/04/09
playing  mojorison67@... | 08/08/09
Thanks Eu, better late than never  Frank from Holland | 08/04/09
RE: Why writing a Windows compatible file server is (still) hard  petryuno1 | 08/04/09
Yes, good point. Would have solved the specific issue.  JeremyAllison | 08/04/09
Well . . .  CobraA1 | 08/04/09
SAMBA Server  SpikeyMike | 08/05/09
someone wants to be stabbed in the eye with a pencil  stevey_d | 08/04/09
LOL! I'll stab his eye with a stapler! Leaves a bigger mark! (NT)  No More Microsoft Software Ever! | 08/04/09
Sounds mature...  mgp3 | 08/04/09
Nope. I don't pay Microsoft for more lock-in.  No More Microsoft Software Ever! | 08/04/09
Good article.  CobraA1 | 08/04/09
POSIX does have time stamps  JeremyAllison | 08/04/09
Mistakes in the eye of the beholder  Yagotta B. Kidding | 08/04/09
The irony of this story compared to the Pre iTunes story  NonZealot | 08/04/09
Everyone?  Yagotta B. Kidding | 08/04/09
Apple has been and is a single, full featured solution.  No More Microsoft Software Ever! | 08/04/09
Excel and DDE  Narg | 08/04/09
Sounds like a tough one  Ed BurnetteZDNet Moderator | 08/04/09
OpenSolaris has CIFS in kernel  Orvar | 08/04/09
Samba outperforms Solaris CIFS.  JeremyAllison | 08/04/09
Nice article. Glad you're on the side of the user! (NT)  No More Microsoft Software Ever! | 08/04/09
Thanks for the insight Jeremy...  SimonUK2 | 08/04/09
Bravo Jeremy! IIRC, I lambasted you in a previous talkback comment  Scrat | 08/04/09
Sorting out Samba *is* my job.  JeremyAllison | 08/04/09
RE: Why writing a Windows compatible file server is (still) hard  TechTeach_z | 08/04/09
Jeremy, I had a similar experience with Excel but with My Briefcase syncs  rltech | 08/04/09
Fascinating.  JeremyAllison | 08/05/09
Great job!  Sumsare | 08/05/09
RE: Why writing a Windows compatible file server is (still) hard  fourijm@... | 08/05/09
Why clone Microsoft technology?  Macintoshtoffy | 08/05/09
A billion clients...  JeremyAllison | 08/05/09

What do you think?

SponsoredWhite Papers, Webcasts, and Downloads

advertisement

Recent Entries

advertisement

Archives

Favorite Links

ZDNet Blogs

White Papers, Webcasts, and Downloads

SmartPlanet

  • Thought-provoking progressive ideas on diverse topics that intersect with technology, business, and life, and matter to the world at large. Visit SmartPlanet
  • More from IBM
  • Innovate your business' process model, play against the market, compete against others on our scoreboards and WIN! Try INNOV8 2.0: A BPM Simulator
  • Enabling Real-World Business Transformation through IBM Service Management Read the EMA Analyst Report
Click Here