On GameSpot: The All-Time Greatest Game Hero revealed
BNET Business Network:
BNET
TechRepublic
ZDNet

September 24th, 2009

Why you should be glad about Gmail failures

Posted by Phil Wainewright @ 10:57 am

Categories: Collaboration, Customer experience, Google, Service level management, Uncategorized

Tags: Google Gmail, Router, E-mail Providers, Cloud Computing, Internet, Phil Wainewright

Gmail is having problems again today and some users are squirming while others aren’t worried.

Of course it’s a hassle when Gmail’s not there any more — I found my work rhythm was interrupted and instead of writing and sending some emails as I’d planned, I had to switch to another task and they’re still sitting on my to-do list now. But the way I look at it, every Gmail outage is a small investment I’m willing to make towards a future when I’ll be able to take its reliability utterly for granted.

With every Gmail fail, Google learns more about operating a cloud-scale, enterprise-class email infrastructure. While it may be true that Hotmail and Yahoo! Mail have more registered users and traffic, neither of them are trying to attract enterprise customers as Google is with its Google Apps suite (of which Gmail is the flagship application). That means no one has ever attempted what Gmail is now doing, and with each slip-up along the way, it learns how to do it better.

Remember the big outage that affected the Gmail web interface on the 1st of this month? Here’s what the Gmail team posted about it later that day:

“This morning (Pacific Time) we took a small fraction of Gmail’s servers offline to perform routine upgrades. This isn’t in itself a problem — we do this all the time, and Gmail’s web interface runs in many locations and just sends traffic to other locations when one is offline.

“However, as we now know, we had slightly underestimated the load which some recent changes (ironically, some designed to improve service availability) placed on the request routers — servers which direct web queries to the appropriate Gmail server for response. At about 12:30 pm Pacific a few of the request routers became overloaded and in effect told the rest of the system ’stop sending us traffic, we’re too slow!’. This transferred the load onto the remaining request routers, causing a few more of them to also become overloaded, and within minutes nearly all of the request routers were overloaded.

“We’ve turned our full attention to helping ensure this kind of event doesn’t happen again. Some of the actions are straightforward and are already done — for example, increasing request router capacity well beyond peak demand to provide headroom. Some of the actions are more subtle — for example, we have concluded that request routers don’t have sufficient failure isolation (i.e. if there’s a problem in one datacenter, it shouldn’t affect servers in another datacenter) and do not degrade gracefully (e.g. if many request routers are overloaded simultaneously, they all should just get slower instead of refusing to accept traffic and shifting their load). We’ll be hard at work over the next few weeks implementing these and other Gmail reliability improvements …”

You see what I mean? Learning, learning, learning from every glitch, and as soon as the solution is found it’s implemented to the benefit of every one of Gmail’s millions of users. As my friend and fellow Enterprise Irregular Anshu Sharma wrote a while back, this is one of the unsung benefits of multi-tenancy. The disasters may be high-profile, but that just incents the provider even more to avoid them in the future. Whereas a software vendor of on-premise, single-tenant applications has little incentive to fix problems that only affect one customer at a time, even if the aggregate outage time is far more severe once you add up the results of each individual failure.

I realize it would be better still if Gmail didn’t fail at all, ever. But think of each small outage as one more step along the path to that ultimate nirvana.

Phil WainewrightPhil Wainewright is a commentator and strategist on emerging software industry trends. See his full profile and disclosure of his industry affiliations.


Email Phil Wainewright

Subscribe to Software as Services via Email alerts or RSS.

Related Discussions on TechRepublic

Did you know you can take part in these discussions with your ZDNet membership?

  • Talkback
  • Most Recent of 61 Talkback(s)
I don't agree
In engineering college one of my professors told me that an expert is somebody that has already learned all of the wrong ways of doing something.

This might be true, but it never helped me rec... (Read the rest)
Posted by: RDEngineer Posted on: 10/19/09 You are currently: a Guest | | Terms of Use
A future that will never happen.  CobraA1 | 09/24/09
Perfection vs. Nirvana  istari2ve2002@... | 09/25/09
Google and their customer service...  timmyjohnboy | 09/28/09
Um... Gmail wasn't even down, genius.  AzuMao | 09/28/09
In other words  GuidingLight | 09/24/09
idiot  Geuseppi | 09/24/09
such an insight!  pupkin_z | 09/24/09
NAA NAA Na NAA Na!  GuidingLight | 09/24/09
I can see you're completely misreading the situation...  The Mentalist | 09/24/09
For someone claining to be a mentalist  GuidingLight | 09/24/09
Just follow my advice...  The Mentalist | 09/25/09
RTFA before replying.  AzuMao | 09/28/09
WTF!?  GuyAlanDye | 09/24/09
did you pay your father for that shot?  Geuseppi | 09/24/09
You do realize  brble | 09/24/09
I do...  GuyAlanDye | 09/24/09
Hotmail is even less reliable, wise one.  AzuMao | 09/28/09
I guess all those shots are finally taking its toll (nt)  The Mentalist | 09/24/09
Touche. wink  GuyAlanDye | 09/24/09
What's all this ridiculous fawning over Gmail?  Wintel BSOD | 09/27/09
so nobody at Google predicted the overload scenario?  pupkin_z | 09/24/09
Is Google paying enough attention?  phil wainewrightZDNet Moderator | 09/24/09
market share  gabriel bear | 09/28/09
Does anyone like to play the "you bet your job" game?  cornpie | 09/24/09
OMG this is ridiculous  Oknarf | 09/24/09
I was thinking...  The Mentalist | 09/24/09
Interesting perspective  GuidingLight | 09/24/09
That's an interesting point...  The Mentalist | 09/25/09
RE: Why you should be glad about Gmail failures  Loverock Davidson | 09/24/09
And you should be glad too!  The Mentalist | 09/25/09
Failure can bring improvement  dazenhobo | 10/03/09
Google Bias  abhijeet.jangam@... | 09/24/09
With every Blue Screen, Microsoft learns more too...  Roque Mocan | 09/24/09
Is this an Onion article?  Helio99000 | 09/24/09
Good think we aren't talking about MobileMe!!  mlindl | 09/24/09
Why write the article?  mathcreative | 09/24/09
wow  savage3006 | 09/24/09
RE: Why you should be glad about Gmail failures  g_keramidas@... | 09/24/09
So that's what you call learning?  amit_kureel | 09/25/09
And we all raise our hands in praise  honeymonster | 09/25/09
RE: Why you should be glad about Gmail failures  amywohl | 09/25/09
World wide beta testing  techman09 | 09/25/09
Great news! Bank looses information on 2 million credit cards  pwatson | 09/25/09
Right path, wrong destination...  Narg | 09/25/09
RE: Why you should be glad about Gmail failures  s3910293 | 09/25/09
Because it is crap it is good?  hannoni1 | 09/25/09
I do not trust any of my important email  Stan57 | 09/25/09
re: I do not trust any of my important email  Doug_Dame@... | 09/30/09
You don't know if they use Gmail  BlueLED | 10/03/09
RE: Why you should be glad about Gmail failures  BaltimoreBarry | 09/28/09
RE: Why you should be glad about Gmail failures  jjmcdonald7911@... | 09/28/09
RE: You Can Learn From Your Failures People No Need To Get Mad.  Synate.Deszeld | 09/28/09
RE: Why you should be glad about Gmail failures  Cyberjester | 09/28/09
RE: Why you should be glad about Gmail failures  Sindhbad | 09/28/09
Better than the Usual Explanation  homant@... | 09/28/09
Google needs to upgrade servers  Randalllind | 09/28/09
Oh, come on!  Mohammad Mubashar | 09/28/09
Wasn't even down.  AzuMao | 09/29/09
RE: Are you serious?  compudog | 09/30/09
RE: Why you should be glad about Gmail failures  annagrayscott | 10/02/09
I don't agree  RDEngineer | 10/19/09

What do you think?

SponsoredWhite Papers, Webcasts, and Downloads

advertisement

Recent Entries

advertisement
Click Here

Archives

ZDNet Blogs

White Papers, Webcasts, and Downloads