Scientific world getting duped by computerized fake research papers

CBS News

If you've ever had occasion to read an academic paper in almost any field in which you don't hold a PhD, chances are that it seems like so much gobbledygook. And there's some chance that it literally is.

A 2005 prank by three MIT graduate students has given way to a new genre: computer-generated fake academic papers, as the Guardian reports. Back then, the students wrote a program called SCIgen in just days that would take high-minded terms from academic writing, paste them together, and create papers that had literally no research or meaning behind them.

The program used "context-free grammar" and added graphics, figures, and citations. The students -- Jeremy Stribling, Max Krohn and Dan Aguayo -- didn't stop there. They submitted two papers under their names to a scientific conference. One was accepted. Here's the gibberish-filled abstract:
Many physicists would agree that, had it not been for congestion control, the evaluation of web browsers might never have occurred. In fa ct, few hackers worldwide would disagree with the essential unification of voice-over-IP and public-private key pair. In order to solve this riddle, we confirm that SMPs can be made stochastic, cacheable, and interposable.
Academic publishing industry spam

Why did they do it? The trio was tired of all the emails they'd receive from conference organizers looking for papers. They thought that the standards for the conferences, which charge hefty fees for attendance, might not be rigorous. It might have ended there -- with the one-time submission of their nonsensical papers -- except that they made the program freely available, and it has become quite popular.

According to a newly published research in the journal Nature, more than 120 papers in a subscription database were computer-generated fakes. The Institute of Electrical and Electronic Engineers (IEEE), a major professional organization in electronic engineering and computer science, had published more than 100 of them.

Computer scientist Cyril Labbé of Joseph Fourier University in Grenoble, France developed software to detect the fakes generated by SCIgen. Some evidence suggests that not all the authors listed on the papers were involved.

According to Labbé, researchers are pressured to keep churning out papers and publish as much as possible -- the old "publish or perish" saying in academia. Maybe this was a stealth campaign to discredit the practice, or perhaps some people thought it could add to their credentials without requiring more hours of work.

But there's a darker side. SCIgen deals with nonsense. But what if you could harness some of the more startling developments in artificial intelligence to create work? (IBM's Watson program that could win at Jeopardy comes to mind.) Add information and let a computer put together a syntactically correct paper that expressed meaning, not context-free grammar that stopped at putting correct parts of speech in the right places, regardless for how it sounded.

You now might be able to create fake papers that would not automatically be distinguishable. Refer to real previous papers as citations, making it even more difficult to discover an auto-paper, because it would exhibit real ties to the rest of the academic publishing industry. (You could even throw in the occasional typo or misspelling to make things look even more authentic.) How could anyone know where human research left off and an academic Terminator picked up?

  • Erik Sherman On Twitter»

    Erik Sherman is a widely published writer and editor who also does select ghosting and corporate work. The views expressed in this column belong to Sherman and do not represent the views of CBS Interactive. Follow him on Twitter at @ErikSherman or on Facebook.

Comments

Market Data

Market News

Stock Watchlist