On MovieTome: TOP 10: Film franchises that must DIE!
BNET Business Network:
BNET
TechRepublic
ZDNet

May 23rd, 2007

Warp speed file serving with pNFS

Posted by Robin Harris @ 3:41 pm

Categories: Clusters, Infrastructure

Tags: Network File System, Network, File System, Storage, Server, Robin Harris

Files: quickly getting bigger. Networks: slowly getting faster. Something’s got to give. Here’s the scoop.

Parallel NFS: standards-based parallel file serving
The Network File System (NFS) is the oldest NAS (Network Attached Storage) protocol. Developed by Sun in the ’80s and made an open standard, NFS makes files on the network available anywhere.

Small files: great. Big files: lo-o-o-ng time coming
NAS is popular because it uses cheap, reliable and reasonably fast Ethernet instead of cranky, expensive and very fast Fibre Channel. NFS is very popular as the storage protocol for compute clusters. Yet as data sets and file sizes have grown, the relative speed of Ethernet just hasn’t kept up.

I worked with some oil companies doing reservoir modeling about six years ago. Even then it was taking them 6-10 hours just to move data from one stage of their workflow to the next. It was killing them.

With 10 gigabit Ethernet coming up, our problems should be solved. But no, NFS had a tough time scaling to gigabit Ethernet. That’s why you see TCP Offload Engines (TOEs), custom hardware pipelines and other costly go-fast goodies on gigE storage.

Enter the dragon
The Internet Engineering Task Force is the NFS standards body. They started working on developing a parallel version of NFS to enable much higher speeds about four years ago. The new standard, NFS v4.1, should reach final draft status later this year. Some early birds may be out with products late this year as well.

How NFS works
Standard NFS file servers work like your PC does: the files are on local disks, and the computer keeps track of their location, name, creation and modification dates, size and so on. The location and so forth is called metadata which means data about your data.

When you request a file, the file server receives the request, looks up the metadata, converts it to disk I/O requests, collects the data and then ships it over the network to you. With small files most of the time is spent collecting the data.

With big files the data transmission time becomes the limiting factor. What if you could break a big file into pieces and ship it in parallel to a compute server? That would be faster, especially with several parallel connections.

That’s exactly what parallel NFS (pNFS) does.

How pNFS works
pNFS splits the NFS file server into two types of servers: the metadata and control server; and as many storage servers as you can afford. Together the control server and the storage servers form a single logical NFS server with a slew of network connections. The compute server, which is likely to be a Beowulf cluster, also has plenty of Ethernet ports as well.

So the compute server requests a file using the new v4.1 NFS client. The NFS control server receives the request and looks up where the file chunks reside on the various storage servers. It send this information, called a layout, back to the NFS v4.1 client, which then tells its cluster members where to get the data. The cluster members then, using the layout, request the data directly from the storage servers.

If you’ve got 10 storage servers for a 10 node cluster, you will see something close to a 10x increase in speed. 100 of each and you’ll see close to 100x increase. It is almost magic.

AND it’s backward compatible
You’ll still be able to access the data even with a lowly PC. Your NFS client makes the request, the control server gathers the data itself, and sends it on to you. Except for the fact that it is slower than pNFS, you’ll never know the difference.

No changes to applications either. The IETF team did a good job on this one.

The Storage Bits take
pNFS is going to be very popular in the large-scale high performance computing cluster space. These clusters are so big that adding just a few hundred bucks per node for some tweak quickly adds up.

I fantasize about a home pNFS array for video editing: stick four gigE ports on my local machine and editing large files wouldn’t be nearly as painful. But that is a ways off. For the big clusters though, a new day is starting to dawn.

Comments welcome, of course. Like reading specs? The IETF NFS v4.1 specs page will make your day.

Robin HarrisRobin Harris has been messing with computers for over 30 years and selling and marketing data storage for over 20 in companies large and small. See his full profile and disclosure of his industry affiliations.


Email Robin Harris

Subscribe to Storage Bits via Email alerts or RSS.

  • Talkback
  • Most Recent of 4 Talkback(s)
Not the first, but the one with the most potential
Parallel and shared disk filesystems, both of which use the concept of separating metadata from data, have been around for a long time. IBM GPFS, SGI CXFS, and LSF (now Sun) QFS were part of this tre... (Read the rest)
Posted by: meh130@... Posted on: 06/12/07 You are currently: a Guest | | Terms of Use
BitTorrent FS, anyone?  Thought1 | 05/24/07
Hadn't thought of it that way, but you're right.  R HarrisZDNet Moderator | 05/24/07
Similiar to Striping??  nixstor | 05/25/07
Not the first, but the one with the most potential  meh130@... | 06/12/07

What do you think?

SponsoredWhite Papers, Webcasts, and Downloads

Click Here
advertisement

Recent Entries

Premier Vendor Content Whitepapers, webcasts & resources from our Power Center Sponsors
advertisement

Archives

Favorite Links

ZDNet Blogs

White Papers, Webcasts, and Downloads

  • Smart Tech Expert advice on innovations in healthcare and the green technologies that make it happen. Find out more
  • Smart Business Discussion and advice on management issues that revolve around making your world smarter and more useful. More Smart Advice
  • Smart People The best and worst moves in the management and strategy trenches. Learn More