X-SciTech

Google To Scan Library Volumes

December 14, 2004 / 7:44 AM EST / CBS/AP

Stacks of hard-to-find books are being scanned into Google Inc.'s widely used Internet search engine in its attempt to establish a massive online reading room for five major libraries.

Material from the New York Public Library as well as libraries at four universities — Harvard, Stanford, Michigan and Oxford — will be indexed on Mountain View, Calif.-based Google under the ambitious initiative announced late Monday.

The Michigan and Stanford libraries are the only two so far to agree to submit all their material to Google's scanners.

The New York library is allowing Google to include a small portion of its books no longer covered by copyright while Harvard is confining its participation to 40,000 volumes so it can gauge how well the process works. Oxford wants Google to scan all its books originally published before 1901.

Scanning books so they can be read through computers isn't new. Both Google and Amazon.com already have programs that offer online glimpses of new books while an assortment of other sites for several years have provide digital access to some material in libraries scattered around the country.

But Google's latest commitment could have the biggest impact yet, given the breadth of material that the company hopes to put into its search engine, which has become renowned for its processing speed, ease of use and accuracy.

"It's a significant opportunity to bring our material to the rest of the world," said Paul LeClerc, president of the New York Public Library. "It could solve an old problem: If people can't get to us, how can we get to them?"

Librarians are also excited about the prospect of creating a digital record for the reams of valuable material written long before computers were conceived.

"This is the day the world changes," said John Wilkin, a University of Michigan librarian working with Google. "It will be disruptive because some people will worry that this is the beginning of the end of libraries. But this is something we have to do to revitalize the profession and make it more meaningful."

The project gives Google's search engine another potential drawing card as it faces stiffening competition for Yahoo Inc. and Microsoft Corp.'s MSN. Attracting visitor traffic is crucial to Google's financial health because the company depends on revenue generated by people clicking on advertising links posted next to the main body of search results.

The competition with Yahoo, Microsoft and Amazon could also help libraries in digitizing their collections for their own institutional uses.

"Within two decades, most of the world's knowledge will be digitized and available, one hopes for free reading on the Internet, just as there is free reading in libraries today," Michael A. Keller, Stanford University's head librarian, told The New York Times.

"Our world is about to change in a big, big way," said Daniel Greenstein, university librarian for the California Digital Library of the University of California, which is a project to organize and retain existing digital materials.

Scanning the library books figures to be a daunting task, even for a cutting edge company such as Google, whose online index of 8 billion Web pages already has revolutionized the way people look for information.

Michigan's library alone contains 7 million of its library volumes — about 132 miles of books. Google hopes to get the job done at Michigan within six years, Wilkin said.

Harvard's library is even larger with 15 million volumes. Virtually all of that material will be off limits Google shows it can scan the material without losing or damaging anything, said Harvard professor Sidney Verba, who also is director of the university's library.

"The librarians at Harvard are very punctilious about protecting their great treasures," Verba said.

The project also poses other prickly issues, such as how to convert material written in foreign languages, and the issue of protecting copyrighted books.

As it does with new books already included in its search engine, Google will only allow its users to view the bibliographies or other snippets of copyrighted books scanned from the libraries. The search engine will provide unrestricted access to all material in the public domain — works no longer covered by copyrights.

The books scanned from libraries will be included in the same Google index the spans the Web. By throwing everything into the same pot, Google risks burying the library book results far below the Web documents containing the same search terms term, reducing the usefulness of the feature, said Danny Sullivan, editor of Search Engine Watch, an industry newsletter.