Digitizing Books For Google: No Quick Task

In a room dimly lit for the sake of capturing the pages of fragile antique books, Courtney Mitchel helps a giant desktop machine digest a rare, centuries-old Bible in Ann Arbor, Mich., March 21, 2008. AP Photo/Carlos Osorio

In a dimly lit back room on the second level of the University of Michigan library's book-shelving department, Courtney Mitchel helped a giant desktop machine digest a rare, centuries-old Bible.

Mitchel is among hundreds of librarians from Minnesota to England making digital versions of the most fragile of the books to be included in Google Inc.'s Book Search, a portal that will eventually lead users to all the estimated 50 million to 100 million books in the world.

The manually scanning - at up to 600 pages a day - is much slower than Google's regular process.

"It's monotonous," the 24-year-old said.

Then she knit her career hopes into the work.

"But it's still something that I'm learning about - how to interact with really old materials and working with digital imaging, which is relevant to art history."

The unusually tight binding on the early-16th-century polyglot Bible made it hard to expose the portions toward the book's middle as Mitchel spread each pair of pages for the scanner. Librarians believe it is the oldest Bible in the world with Arabic type.

Google, the Internet's leader in search and advertising, says the process it developed and is using for scanning the majority of the books in Book Search is proprietary. Employees will not discuss it except to say it is much faster than what Mitchel is doing and it's not destructive.

"It took us quite a while to develop it so we do keep that confidential," said a library manager for Book Search, Ben Bunnell, who declined even to say where Google does the scanning.

Many libraries began digitizing books a decade ago to preserve them. Funding from Google allows the 28 libraries it's working with to cut their digitizing costs because they don't have to pay for scanning the books Google wants to include in Book Search.

Through Book Search, users can track down a book on any topic they're interested in and read a small portion. If the book's not protected by copyright, users can download the whole thing. If it is, or if they just want to read an original, they can use Book Search to find copies to buy or borrow.

More than 1 million rare or fragile books have been digitized through the Google-Michigan partnership since it began in 2004, with an estimated 6 million to go.

Book Search has the support of many publishers, authors and librarians, including Cambridge University Press and Wisdom Publications. But some publishers and authors have sued, claiming the service violates their copyrights. Google says Book Search is aboveboard because Web surfers can retrieve only snippets of copyright material through the service.

Brewster Kahle, founder and digital librarian of the Internet Archive at the Open Content Alliance, said Google may be trying to "lock up the public domain" by making proprietary copies of works whose copyrights have expired - which includes the vast majority of the world's books.

Kahle said there's a core value in the project, in preserving material indefinitely and enabling broad access to it. But he questioned whether Google will share the works it digitizes with other search engines.

"We believe there should be many libraries, many publishers, many search engines, many types of users from different points of view," Kahle said.

John Price Wilkin, Michigan's associate university librarian, called Kahle's stance "theoretical."

"Our volumes are entirely open in the sense that people can find them, read them, use them, do all the things that they would do in scholarship or pleasure," Wilkin said.

In the room where Mitchel and colleague Chava Israel, an artist, work, the temperature is always in the 60s.

Each technician has a slightly angled table with a flexible middle that cradles books and holds them still while two overhead cameras photograph the pages. Sometimes the women play music or listen to news online, but they often work in silence, save the clicks of their computers and scanners.

Mitchel glides in a rolling chair forth and back between scanner and computer, computer and scanner, turning page upon page and clicking her mouse to shoot each pair. Once the images reach the computer, the women use the book scanning software Omniscan from Germany's Zeutschel GmbH to clean them up.

A final click of the mouse sends each digitized book to Google for optical character recognition processing, which makes the text searchable. Google then returns a copy of the images and data to the library and posts another to the Web.

Israel, 44, who has been scanning books for three years, takes a philosophical view of the project.

"My favorite part is working with older books and being able to preserve a lot of the knowledge and help bring more people access," Israel said. "I turn pages. It's kind of meditative."
  • CBSNews

Comments