Watch CBS News

Don't Know Your Data? Then You Don't Know Business

MySpace (NWS) has announced it will put bulk anonymized data up for sale for researchers and hobbyists. MIT just held its sports analytics conference (pictured right), and reps from 16 NBA basketball teams attended. The New York Times has a March 12 piece on mommy-blog boot camps devoted to driving up traffic. The message is clear: data's where it's at. But for those of us without a degree in statistics, will our business sense be rendered useless?

Perhaps not. There are some counter-intuitive facets to this fetish for data, after all.

One oddity is the way data-selling seems to fly in the face of basic economics. We're always told that ever more data is being collected, and yet, there are those (like MySpace) who'd sell you theirs. To sell a natural resource so bountiful seems almost like a joke: this is like bottled air, right? A recent piece in the Economist puts a fine point on it, saying that our world data supply is "getting ever vaster ever more rapidly," in a piece in late February. It called data "superabundant," so voluminous that we won't be able to store nearly all of it. Indeed, the data marketplace that MySpace has chosen to host its set is InfoChimps, which boasts another 7,000+ data sets available for the combing, most for free. (Facebook has been selling its data to market researchers for about a year.)

So perhaps it's the level of detail that makes the MySpace records worthy of buying. While friend lists won't be included, buyers will have access to nameless user playlists, mood updates, zip codes, link recommendations, photos and blog posts. You can even get your MySpace data furnished with geographical data for a little extra money: while raw data dumps start at $10, add-ons like latitude and longitude can push the price to a few hundred bucks.

Seems cheap, doesn't it? Perhaps that's because the parts of MySpace's dataset are carefully isolated to make them anonymous. But at least one academic paper has argued that if anonymized data is actually useful, then it's not truly anonymous, meaning that the data MySpace (and Facebook) sell is either inherently worthless or a big privacy problem.

Datasets, therefore, are valuable but mercurial oracles. That's the curious duality of them. That's also the lesson Netflix (NFLX) is finding out the hard way. This week the company was forced to cancel their second "recommendation engine" contest, which relied on giving contestants anonymized Netflix data, after a class action lawsuit and an FTC notice made it clear that the data used could be twisted to track down actual people and learn things about them. In fact, it's not so easy to fully anonymize data, as AOL also learned the hard way in 2006.

That's the difference between the suspiciously cheap MySpace data offering, and the kind of data that brought the NBA to MIT: one has names attached, and the other doesn't. It's the reason that when Mint.com decided to sell its data, it was actually only in the business of selling results, not raw data. It's also the reason that OKTrends, the blog of the dating site OKCupid, is so incredibly fascinating: they're working with their own data behind the curtain and showing us the results.

So go ahead and analyze your own company data -- but don't expect miracles. As I wrote in February, on the heels of the Microsoft (MSFT) and Yahoo (YHOO) search deal, the game of data-analysis gets tough pretty quick, because it's a game of diminishing returns: you need four times as much data to improve your last estimate by 50%.

View CBS News In
CBS News App Open
Chrome Safari Continue
Be the first to know
Get browser notifications for breaking news, live events, and exclusive reporting.