X-SciTech

Study: What A Web We Wove

July 27, 2000 / 8:28 AM EDT / AP

Just how big is the Internet? According to a new study released Wednesday, the Internet has become so large that sophisticated search engines are just scratching the surface of the Web's tremendous information reservoir.

A South Dakota company that prepared the research paper has developed a new software to plumb the Internet's depths, reports CBS News Correspondent Dan Raviv. It estimates that the World Wide Web is 500 times larger than the maps provided by popular search engines like Yahoo!, AltaVista and Google.com.

BrightPlanet, the Sioux Falls start-up behind the report, believes it has developed a solution with software called LexiBot. With a single search request, the technology not only searches the pages indexed by traditional search engines, but delves into the databases on the Internet and fishes out the information contained in them.

Such hidden information coves, well-known to the 'Net savvy, have become a tremendous source of frustration for researchers who can't find the information they need with a few simple keystrokes.

"These days it seems like search engines are a little like the weather: Everyone likes to complain about them," said Danny Sullivan, editor of SearchEngineWatch.com, which analyzes search engines.

For years, the uncharted territory of the Internet's World Wide Web sector has been dubbed the "invisible Web."

BrighPlanet describes the terrain as the "deep Web" to distinguish it from the surface information captured by Internet search engines.

"It's not an invisible Web anymore. That's what's so cool about what we are doing," said Thane Paulsen, BrightPlanet's general manager.

Many researchers suspected that these underutilized outposts of cyberspace represented a substantial chunk of the Internet, but no one seems to have explored the Web's back roads as extensively as BrightPlanet.

Deploying new software developed over the last six months, BrightPlanet estimates there are now about 550 billion documents stored on the Web. Combined, Internet search engines index about 1 billion pages. One of the first Web search engines, Lycos, had an index of 54,000 pages in mid-1994.

While search engines obviously have come a long way since 1994, the reason they aren't indexing even more pages is because an increasing amount of information is stored in evolving, giant databases set up by government agencies, universities and corporations.

Search engines rely on technology that generally identifies "static" pages, rather than the "dynamic" information stored in databases.

This means that general-purpose search engines will guide users to the home site that houses a huge database, but finding out what's in them requires additional queries.

The LexiBot isn't for everyone, BrightPlanet executives concede. For one thing, the software costs money - $89.95 after a free 30-day trial. For another, a LexiBot search isn't fast. Typical searhes will take 10 to 25 minutes to complete, but could require up to 90 minutes for the most complex requests.

"If you are frustrated about what you can't find on the Internet, then you are a target audience," Paulsen said. "This isn't for grandma when she is looking for chocolate chip recipes on the Internet."

The privately held company, trying to raise money from venture capitalists, expects LexiBot to be particularly popular in academic and scientific circles. The company also plans to sell its technology and services to businesses looking to provide relevant information for visitors to their sites.

About 95 percent of the information stored in the deep Web is free, according to BrightPlanet. Much of it is technical information that is extraordinarily useful, researchers said. The company has listed 20,000 of the "content-rich" databases uncovered by LexiBot on a Web site, completeplanet.com.

Another Web site, invisibleweb.com, already offers a similar directory of large databases on the Internet.

Despite some grumbling, most mainstream Internet users seem satisfied with the free search engines that serve as the Web's road map.

In a survey of 33,000 search engine users earlier this year, NPD New Media Services found that 81 percent of the respondents said they find what they are looking for all or most of the time.

That was an improvement from 77 percent of search engine users reporting a positive experience in the fall of 1999. Only 3 percent of the search engine users said they never find what they want.

Several Internet veterans who reviewed BrightPlanet's research Wednesday were intrigued by the company's software, but warned that it could be too overwhelming.

"The World Wide Web is getting to be so humongous that you need specialized engines. A centralized approach like this isn't going to be successful," predicted Carl Malamud, co-founder of Petaluma-based Invisible Worlds.

Like BrightPlanet, Invisible Worlds is trying to extract more data hidden from search engines, but is customizing the information. Malamud calls this process "giving context to the content."

Sullivan agreed that BrightPlanet's greatest challenge will be showing businesses and individuals how to effectively deploy the company's product.
"They are going to have to show people wher eto dive in," he said. "Otherwise, people will just drown."

More from CBS News