PITTSBURGH, Oct. 20, 2007

Using Spam Blockers To Preserve Literature

Inventor Of Those Distorted Words You Type In Online Hopes To Give Them A Higher Purpose

  • Play CBS Video Video New Calling For 'CAPTCHAS'

    The inventor of "CAPTCHAS," the letter puzzles credited with stopping spam, has found a useful vocation for the time-wasting tests in the realm of digital book preservation. Daniel Sieberg reports.

    • Luis von Ahn, right, pictured while still a graduate student in 2002, helped create captchas, those puzzling strings of letters and numbers that you're forced to type in when shopping online. Now he hopes to put the system to a more lofty use. Photo

      Luis von Ahn, right, pictured while still a graduate student in 2002, helped create captchas, those puzzling strings of letters and numbers that you're forced to type in when shopping online. Now he hopes to put the system to a more lofty use.  (CBS/AP)

    • Photo

       (CBS)

    Previous slide Next slide
(CBS)  If you've ever signed up for an e-mail account or bought something online, you've probably seen them: puzzling strings of letters and numbers that you're forced to type in.

They're called Captchas (an acronym for "Completely Automated Turing Test To Tell Computers and Humans Apart") and they're a major line of defense against spam. Without them, spammers’ computers could automatically sign up for millions of e-mail accounts. But computers can't read letters and numbers that are slightly distorted. The captchas require a human to type them in, and that shuts down the spammers.

But while Captchas are a helpful tool, they can also waste time. In fact, hundreds of thousands of hours are wasted every day by people solving captchas around the world. But now there’s a professor who wants to turn that wasted time into an effort to preserve the world’s great literature, reports CBS News technology correspondent Daniel Sieberg.

Seven years ago Carnegie Mellon computer science professor Luis von Ahn helped create captchas for Internet giants like Yahoo. Captchas have been successful in slowing spammers, and now he wants to improve on his idea for a project he believes is essential to the world's history: preserving books.

“There are hundreds of millions of books that need to be digitized,” he said, “and that's what re-captcha is going to allow you to do.”

With re-Captcha, von Ahn is asking all Web sites to switch to his new tool. No catch. So instead of typing in captchas that are gibberish stuff like "vr4tyw77," you would type in actual words -- words from books that need to be digitized. Those books then become part of the World Wide Web, accessible to all of us.

Brewster Kale runs the Internet Archive Project and is already seeing dividends from Luis’ efforts.

Quote

With 500 million people typing in words everyday, we can get something done in a couple of years that would have taken hundreds.

Brewster Kale, head, Internet Archive Project
“With 500 million people typing in words everyday, we can get something done in a couple of years that would have taken hundreds,” he said.

All thanks to the power of the Internet and a brilliant professor -- with a little help from millions of his closest friends.

© MMVII, CBS Interactive Inc. All Rights Reserved.

Video and Galleries from CBS Evening News: Eye On Technology

Add a Comment
by red1530 October 20, 2007 8:32 PM PDT
This sounds like a good idea.
Reply to this comment
by gunnerone2 October 20, 2007 11:52 PM PDT
Great idea. Wish I had thought of it.
Reply to this comment
by Krazcarl October 22, 2007 3:35 AM PDT
Impresive great idea hope it catches on would like to be a part of a endevor like that. Guess thats why he makes the big bucks.
Reply to this comment
by yo_marc October 22, 2007 9:08 AM PDT
I absolutely do not understand this!?

If we''re to type the words we see on the screen to help digitize books -- where are these words coming from on the computer side of things? They''ve got to be already ''digitized'' and on a database somewhere!?

How''s some server going to know how to order these particular words to quote ''war and peace'' word for word? It would ''have'' to have the book digitized in some format already.

... and what about copyright laws?
Reply to this comment
by sharkzone1 October 22, 2007 11:34 AM PDT
The only way I see how it would work is only part is text to translate the other part must be known text.
Reply to this comment
by sharkzone1 October 22, 2007 11:38 AM PDT
From re-captcha

"But if a computer can''t read such a CAPTCHA, how does the system know the correct answer to the puzzle? Here''s how: Each new word that cannot be read correctly by OCR is given to a user in conjunction with another word for which the answer is already known. The user is then asked to read both words. If they solve the one for which the answer is known, the system assumes their answer is correct for the new one. The system then gives the new image to a number of other people to determine, with higher confidence, whether the original answer was correct."
Reply to this comment
  • MOST POPULAR
  • Viewed
  • Commented
Latest News
Featured Blogs