Using Spam Blockers To Preserve Literature
Inventor Of Those Distorted Words You Type In Online Hopes To Give Them A Higher Purpose
-
Play CBS Video
Video
New Calling For 'CAPTCHAS'
The inventor of "CAPTCHAS," the letter puzzles credited with stopping spam, has found a useful vocation for the time-wasting tests in the realm of digital book preservation. Daniel Sieberg reports.
-
-
Photo
Luis von Ahn, right, pictured while still a graduate student in 2002, helped create captchas, those puzzling strings of letters and numbers that you're forced to type in when shopping online. Now he hopes to put the system to a more lofty use. (CBS/AP)
-
Photo
(CBS)
-
- Stories
- Genius, Hard Work, Pay Off
They're called Captchas (an acronym for "Completely Automated Turing Test To Tell Computers and Humans Apart") and they're a major line of defense against spam. Without them, spammers’ computers could automatically sign up for millions of e-mail accounts. But computers can't read letters and numbers that are slightly distorted. The captchas require a human to type them in, and that shuts down the spammers.
But while Captchas are a helpful tool, they can also waste time. In fact, hundreds of thousands of hours are wasted every day by people solving captchas around the world. But now there’s a professor who wants to turn that wasted time into an effort to preserve the world’s great literature, reports CBS News technology correspondent Daniel Sieberg.
Seven years ago Carnegie Mellon computer science professor Luis von Ahn helped create captchas for Internet giants like Yahoo. Captchas have been successful in slowing spammers, and now he wants to improve on his idea for a project he believes is essential to the world's history: preserving books.
“There are hundreds of millions of books that need to be digitized,” he said, “and that's what re-captcha is going to allow you to do.”
With re-Captcha, von Ahn is asking all Web sites to switch to his new tool. No catch. So instead of typing in captchas that are gibberish stuff like "vr4tyw77," you would type in actual words -- words from books that need to be digitized. Those books then become part of the World Wide Web, accessible to all of us.
Brewster Kale runs the Internet Archive Project and is already seeing dividends from Luis’ efforts.
With 500 million people typing in words everyday, we can get something done in a couple of years that would have taken hundreds.
Brewster Kale, head, Internet Archive ProjectAll thanks to the power of the Internet and a brilliant professor -- with a little help from millions of his closest friends.
© MMVII, CBS Interactive Inc. All Rights Reserved.
Video and Galleries from CBS Evening News: Eye On Technology
- Latest in CBS Evening News: Eye On Technology
- Is Palm's "Pre" An iPhone Killer?
- A Green Way To Clean Up A Dirty Problem
- A Green Way To Clean Up A Dirty Problem



If we''re to type the words we see on the screen to help digitize books -- where are these words coming from on the computer side of things? They''ve got to be already ''digitized'' and on a database somewhere!?
How''s some server going to know how to order these particular words to quote ''war and peace'' word for word? It would ''have'' to have the book digitized in some format already.
... and what about copyright laws?
"But if a computer can''t read such a CAPTCHA, how does the system know the correct answer to the puzzle? Here''s how: Each new word that cannot be read correctly by OCR is given to a user in conjunction with another word for which the answer is already known. The user is then asked to read both words. If they solve the one for which the answer is known, the system assumes their answer is correct for the new one. The system then gives the new image to a number of other people to determine, with higher confidence, whether the original answer was correct."