MoneyWatch Tech

Google Shouldn't Fear: Facebook's Search Engine May Actually Get Worse with More Users

By Chris Dannen

Updated on: June 25, 2010 / 5:06 PM EDT / MoneyWatch

External websites have begun to show up in Facebook's network search results, according to the blog AllFacebook, which has called it the first signs of an "Open Graph search engine". The possibility of "Facebook SEO" heralds a new Facebook-Google (GOOG) rivalry for Web search, according to that same blog, which argues that "the war has begun." But the "war," if there is one, might result in Facebook defeating itself.

Our sister site CNET has attempted to debunk the Facebook vs. Google paradigm, saying that Facebook could never hope to match the "breadth and depth of results produced by companies that actually crawl the Web." But with investor Microsoft's (MSFT) Bing running its outside search, a crawler isn't out of the question. The real question is whether Facebook's own SEO would actually add any value to Bing.

The answer comes down to the debatable power of semantic search, a kind of natural-language counterpart to Google's system (which mostly analyzes keywords to generate results). Facebook has set its ambitions on building a semantic search engine built on its user data. Google has experimented with semantic search, too, allowing users to ask questions in the search box instead of just punching in terms. But to date, the search giant has only considered semantics a small part of its strategy. Last spring, Google's VP of search product said that the sheer enormity of Google's search data can effectively mimic a semantic search engine because it has seen so many searches and clicks:

What we're seeing actually is that with a lot of data, you ultimately see things that seem intelligent even though they're done through brute force. Because we're processing so much data, we have a lot of context around things like acronyms. Suddenly, the search engine seems smart, like it achieved that semantic understanding, but it hasn't really. It has to do with brute force. That said, I think the best algorithm for search is a mix of both brute-force computation and sheer comprehensiveness and also the qualitative human component.

But the extent to which past data can improve results has been disputed -- by none other than Google's own Chief Economist, Hal Varian. In August 2009, he told CNET that Google's growing scale doesn't offer much in the way of marginal benefits in search:

... [P]eople keep talking about how more data gives you a bigger advantage. But when you look at data, there's a small statistical point that the accuracy with which you can measure things as they go up is the square root of the sample size. So there's a kind of natural diminishing returns to scale just because of statistics: you have to have four times as big a sample to get twice as good an estimate.

So perhaps Google isn't as far ahead as CNET would like to believe. Perhaps Facebook has a good idea after all. But then again, if Facebook were to expand upon its search function -- which, it's important to note, ranks outside Web results largely on its users' "Likes" and not links, like Google's PageRank does -- it might not make a very good search engine at all. Here's why.

As Clay Shirky argued way back in 2003 in a post about "Power Laws," social networks aren't predisposed to returning useful search results. In networks "where many people are free to choose between many options," Shirky writes, "a small subset of the whole will get a disproportionate amount of traffic."

Take a sip of your coffee and dive into the next quotation to see why (emphasis mine):

[T]hanks to a series of breakthroughs in network theory... we know that power law distributions tend to arise in social systems where many people express their preferences among many options. We also know that as the number of options rise, the curve becomes more extreme. This is a counter-intuitive finding -- most of us would expect a rising number of choices to flatten the curve, but in fact, increasing the size of the system increases the gap between the #1 spot and the median spot.

The reason is because when a system results based "Likes," it is allowing one user's choices to affect another's. In Google's keyword search, it doesn't much matter whether my friends are searching with Google or not; Google's utility remains unchanged to me. Facebook is much different. Shirky gives the following example. Let's say you're searching for blogs in Facebook's search engine:

... [P]eople's choices do affect one another. If we assume that any blog chosen by one user is more likely, by even a fractional amount, to be chosen by another user, the system changes dramatically. Alice, the first user, chooses her blogs unaffected by anyone else, but Bob has a slightly higher chance of liking Alice's blogs than the others. When Bob is done, any blog that both he and Alice like has a higher chance of being picked by Carmen, and so on, with a small number of blogs becoming increasingly likely to be chosen in the future because they were chosen in the past.

So Facebook's greatest weakness as a search engine is not simply that is undeveloped, as SearchEngineLand says, nor that it is easily gamed, as CNET and Marty Weintraub have said. It may instead be that as it collects data, it actually becomes worse at finding what users want.

Facebook can counter this effect by giving its search engine more emphasis on inviolable data like names (which are effectively keywords) and location. But of the seven criteria that its search engine is based on, five of them are subject to the Power Law that allows other people's choices to limit your own. Of course, in some other content systems -- say, Digg or Reddit -- this is exactly the point. But with a search engine, which is useful especially for long-tail, non-obvious content, it might become a serious problem.

Related: