On mySimon: Peg Perego John Deere Utility Tractor
BNET Business Network:
BNET
TechRepublic
ZDNet

May 25th, 2008

Peter Norvig on Google's mistrust of machine learning

Posted by Garett Rogers @ 9:53 am

Categories: Google

Tags: Algorithm, Google Inc., Peter Norvig, Anand, Engineering, Garett Rogers

Peter Norvig has been at Google for a long time now — he was, until recently, the Directory of Search Quality; the guy who made sure every time you submitted a query, you usually got what you wanted. He has since moved into the Director of Research role, but is taking time off right now to update his textbook: “Artificial Intelligence: A Modern Approach”.

Peter recently sat down with Anand Rajaraman to discuss several things including search quality at Google — for a great read, check out this article. Anand explains how Google’s search algorithm consists of offline and online phases. That is, the time-consuming process of discovering then tagging webpages is done offline, and is obviously query independent, and an online phase that happens at the time of search.

The online, query-dependent phase appears to be made-to-order for machine learning algorithms. Tons of training data (both from usage and from the armies of “raters” employed by Google), and a manageable number of signals (200) — these fit the supervised learning paradigm well, bringing into play an array of ML algorithms from simple regression methods to Support Vector Machines.

This setup is perfect for machine learning. Throw in those expert “raters” that Google pays to sift through search results, and you have machine learning just waiting to happen. Researchers at Google have reached the point where a machine learned model is equal to, or better than, the hand-crafted algorithm that currently sorts Google’s giant index in real-time when a user enters a query.

So why isn’t Google using this machine learning model for their search engine then? Well, Peter suggests that there are two reasons. The first is that those engineers who hand made the current algorithm don’t think a machine could do better. The second, as Anand says, is more interesting. Google worries that machine-learned models may suffer “catastrophic errors on searches that look very different from the training data”.

If they are indeed testing this model, I would be very nice to see it as a “search experiment”. What do you think? Are Google’s concerns about machine learning going to keep it from ever becoming the engine that drives search results on their search engine?

Garett RogersGarett Rogers is employed as a programmer for iQmetrix, which specializes in retail management software for the wireless industry. See his full profile and disclosure of his industry affiliations.


Email Garett Rogers

Subscribe to Googling Google via Email alerts or RSS.

  • Talkback
  • Most Recent of 2 Talkback(s)
RE: Peter Norvig on Google's mistrust of machine learning
Training data-vs-real data mismatch is partly what created the recent housing crisis, so I'd say Google is appropriately cautious. (There was recently a cool This American Life episode delving into the root causes of the crisis -- as always, freely available at thislife.org ).... (Read the rest)
Posted by: dal2010@... Posted on: 05/30/08 You are currently: a Guest | | Terms of Use
I can understand Peter Norvig's (or Google's)  mhenriday | 05/26/08
RE: Peter Norvig on Google's mistrust of machine learning  dal2010@... | 05/30/08

What do you think?

SponsoredWhite Papers, Webcasts, and Downloads

advertisement

Recent Entries

advertisement

Archives

Favorite Links

ZDNet Blogs

White Papers, Webcasts, and Downloads