February 25, 2010 6:28 AM
- Text
Google Patent Auto-Converts Print Publications to E-Articles
(MoneyWatch)
A patent application by Google (GOOG), filed in August 2008 and only made public last week, shows that the company is working on an automated way to split printed magazines and newspapers into individual articles that it could then deliver separately. Although this could allow Google to convert stacks of periodicals into electronic archives, it potentially sends the company headlong into conflict with a famous Supreme Court ruling on media law.
Application 20100040287 is titled Segmenting Printed Media Pages Into Articles. The company has already shown its interest in getting periodicals online as part of Google Books. But there are two problems. The technical one is tricky, as the application describes:
There's just one legal problem: New York Times Co. , et. al. v. Jonathan Tasini et. al. Usually called the Tasini case, freelance writers sued the New York Times and other print publications for licensing individual articles to database companies without permission from the writers, who retained the copyright on the articles. One of the main turning points was that the publishers had explicit permission only to include the articles in the print publication. However, copyright law did not allow the publishers to break their publications up and make the articles accessible to readers out of the original context.
Google's patent application describes a process that would do exactly that. If the company receives permission from the appropriate rights holders, that would be possible. But going through years of magazines to determine who exactly could legally give permission would be extremely difficult and time consuming. Google could do as it did with scanning books: act and wait to be sued. Yet chances are good that the company would waltz right into another lawsuit, as it has with Google Books, only one where the precedent -- a clear Supreme Court decision -- would wipe away much of the legal ambiguity Google might want to claim. And freelance writers have shown themselves ready to call their lawyers.
[UPDATE: It dawned on me that I had missed an extra twist on the legal front. In the Google Books case, the publishers could also bring suit, and their larger-than-freelance resources, because Google was potentially infringing their rights as well. If Google goes back far enough in magazine and newspaper archives, before publishers often demanded and got extensive rights, then by breaking out individual articles, they would be dealing with only the freelance writers, most of whom have not registered copyright on their articles. That means most of the writers would not have legal standing to bring a suit. Even if the freelancers registered copyright after the infringement, they'd be limited to seeking only the "profits" from use of their material and couldn't even sue for legal fees. That would effectively leave Google free to use the material, knowing that the writers could not afford to challenge the company in court. For the small portion of writers that had registered their copyright, Google has plenty of money to fight them in court.]
Application 20100040287 is titled Segmenting Printed Media Pages Into Articles. The company has already shown its interest in getting periodicals online as part of Google Books. But there are two problems. The technical one is tricky, as the application describes:
Complex printed media material, such as a newspaper, often involve columns of body text, headlines, graphic images, multiple font sizes, comprising multiple articles and logical elements in close proximity to each other, on a single page. Attempts to utilize optical character recognition in such situations are typically inadequate resulting in a wide range of multiple errors, including, for example, the inability to properly associate text from multiple columns as being from the same article, mis-associating text areas without an associated headline or those articles which cross page boundaries, and classifying large headline fonts as a graphic image.The application describes how Google would detect blocks of text and determine how they fit together into articles. The implications are clear. Once Google could break scanned magazines and newspapers down into individual articles, it could then store and serve up these articles, perhaps using optical character recognition to create text file versions and then use the context for search as well as advertising.
There's just one legal problem: New York Times Co. , et. al. v. Jonathan Tasini et. al. Usually called the Tasini case, freelance writers sued the New York Times and other print publications for licensing individual articles to database companies without permission from the writers, who retained the copyright on the articles. One of the main turning points was that the publishers had explicit permission only to include the articles in the print publication. However, copyright law did not allow the publishers to break their publications up and make the articles accessible to readers out of the original context.
Google's patent application describes a process that would do exactly that. If the company receives permission from the appropriate rights holders, that would be possible. But going through years of magazines to determine who exactly could legally give permission would be extremely difficult and time consuming. Google could do as it did with scanning books: act and wait to be sued. Yet chances are good that the company would waltz right into another lawsuit, as it has with Google Books, only one where the precedent -- a clear Supreme Court decision -- would wipe away much of the legal ambiguity Google might want to claim. And freelance writers have shown themselves ready to call their lawyers.
[UPDATE: It dawned on me that I had missed an extra twist on the legal front. In the Google Books case, the publishers could also bring suit, and their larger-than-freelance resources, because Google was potentially infringing their rights as well. If Google goes back far enough in magazine and newspaper archives, before publishers often demanded and got extensive rights, then by breaking out individual articles, they would be dealing with only the freelance writers, most of whom have not registered copyright on their articles. That means most of the writers would not have legal standing to bring a suit. Even if the freelancers registered copyright after the infringement, they'd be limited to seeking only the "profits" from use of their material and couldn't even sue for legal fees. That would effectively leave Google free to use the material, knowing that the writers could not afford to challenge the company in court. For the small portion of writers that had registered their copyright, Google has plenty of money to fight them in court.]
-
Erik Sherman Erik Sherman is a widely published writer and editor who also does select ghosting and corporate work. Follow him on Twitter at @ErikSherman or on Facebook.
Follow on Twitter »
Latest Now in MoneyWatch
- Jeremy Grantham's investing strategies for 2012
- iPhone hurts Sprint profits
- Most companies are complacent and narcissistic
- Look to weaknesses to find your leadership strengths
- McDonald's key revenue figure up 6.7 pct. in Jan.
- Home foreclosures decline, but completions rise
- "Person to Person" to feature Warren Buffett
- Time Warner beats earnings expectations
- 18 tax credits and deductions to take this year
- Sprint posts deeper loss on iPhone costs
- Is this a stock-picker's year?
- Urban farming on the rise nationwide
- Consumer debt skyrocketed in 4th quarter
- Find your next job with help from Glassdoor
- Should I rat out my boss?
- How real leaders come from behind and win
- Nationwide foreclosure pact gains momentum
Latest CBS News Headlines
on Facebook Most Discussed Stories
on CBS News
- Many small business owners favor "Buffett rule"
- Indiana cites builder in deadly stage collapse
- FDA panel votes against Xgeva for prostate cancer
- Wright bros' first bike shop may be demolished
on Facebook Most Discussed Stories
on CBS News






