Voice Analysis Lacks Accuracy
Intelligence analysts examining the audiotape message that purports to be from Osama bin Laden may be convinced the voice is bin Laden's - but they will never be sure.
Computer voice analysis lacks the accuracy of fingerprint or DNA identification and can be hamstrung by a skilled impersonator or low-quality recording.
"Where there's a combination of strong motivation and relatively weak science, there's an opportunity for deception," said Robert Berkovitz, a speech analyst with Sensimetrics Corp. "You can't put the voice in a slot and have it come out saying, 'This is Joe Smith."'
Analysts at the National Security Agency and CIA who are handling the tape are measuring two kinds of voice characteristics against previous recordings, experts said.
First, they measure the acoustics that give an idea of the physical features of a person's vocal tract - the shapes of the mouth, throat and nasal passages used in speech.
They also study the style of speech - the timing, speed and pitch - while looking for distinctive intonations, Berkovitz said.
Neither measurement is exact, but a close resemblance on both counts can give a good idea that the voice is a match.
"You can say with some probability, but you can never be sure," said Kenneth Stevens, a Massachusetts Institute of Technology expert in speech analysis and synthesis.
In the case of bin Laden, where convincing the world that he is alive might be important, a skilled impersonator using a low-quality recording could fool some analysts, Berkovitz said.
"If you have someone who can read like bin Laden, they can have him read it," he said. "People can change their voices so easily."
Analysts poring over bin Laden's voiceprint would be looking at computerized "wave forms" that plot a voice's pitch, speed and volume into dark and light shapes resembling a topographical map, said Brian Moncur, technology director at Fonix Corp. of Salt Lake City, which sells speech recognition and synthesis software.
For instance, the wave forms of the word "America" from one or more known bin Laden recordings might be overlaid atop the wave form of "America" from the current audiotape.
Problem is, the wave forms never match each other exactly.
"People hardly ever pronounce the same word the same way twice, even in the same utterance," Berkovitz said.
Language analysis software can handle recordings in Arabic.
Wave forms "are pretty much language-independent," Moncur said. "We all use the same organs to make the same speech. It's a mathematical representation."
By Jim Krane
© 2009 The Associated Press. All Rights Reserved. This material may not be published, broadcast, rewritten, or redistributed. Computer voice analysis lacks the accuracy of fingerprint or DNA identification and can be hamstrung by a skilled impersonator or low-quality recording.
"Where there's a combination of strong motivation and relatively weak science, there's an opportunity for deception," said Robert Berkovitz, a speech analyst with Sensimetrics Corp. "You can't put the voice in a slot and have it come out saying, 'This is Joe Smith."'
Analysts at the National Security Agency and CIA who are handling the tape are measuring two kinds of voice characteristics against previous recordings, experts said.
First, they measure the acoustics that give an idea of the physical features of a person's vocal tract - the shapes of the mouth, throat and nasal passages used in speech.
They also study the style of speech - the timing, speed and pitch - while looking for distinctive intonations, Berkovitz said.
Neither measurement is exact, but a close resemblance on both counts can give a good idea that the voice is a match.
"You can say with some probability, but you can never be sure," said Kenneth Stevens, a Massachusetts Institute of Technology expert in speech analysis and synthesis.
In the case of bin Laden, where convincing the world that he is alive might be important, a skilled impersonator using a low-quality recording could fool some analysts, Berkovitz said.
"If you have someone who can read like bin Laden, they can have him read it," he said. "People can change their voices so easily."
Analysts poring over bin Laden's voiceprint would be looking at computerized "wave forms" that plot a voice's pitch, speed and volume into dark and light shapes resembling a topographical map, said Brian Moncur, technology director at Fonix Corp. of Salt Lake City, which sells speech recognition and synthesis software.
For instance, the wave forms of the word "America" from one or more known bin Laden recordings might be overlaid atop the wave form of "America" from the current audiotape.
Problem is, the wave forms never match each other exactly.
"People hardly ever pronounce the same word the same way twice, even in the same utterance," Berkovitz said.
Language analysis software can handle recordings in Arabic.
Wave forms "are pretty much language-independent," Moncur said. "We all use the same organs to make the same speech. It's a mathematical representation."
By Jim Krane
Popular in SciTech
- Take a billion-pixel tour of Mars
- Baby named Like after Facebook button
- Airborne laser reveals hidden city in Cambodia
- Privacy officials request details on Google Glass
- NASA unveils plans to capture asteroids
- Apple TV gets HBO GO, WatchESPN
- Solar plane lands at Dulles Airport Play Video
- Last telegram ever to be sent July 14














