Voice Analysis Lacks Accuracy

Education Secretary Arne Duncan on "Face the Nation," Sept. 6, 2009. CBS

Intelligence analysts examining the audiotape message that purports to be from Osama bin Laden may be convinced the voice is bin Laden's - but they will never be sure.

Computer voice analysis lacks the accuracy of fingerprint or DNA identification and can be hamstrung by a skilled impersonator or low-quality recording.

"Where there's a combination of strong motivation and relatively weak science, there's an opportunity for deception," said Robert Berkovitz, a speech analyst with Sensimetrics Corp. "You can't put the voice in a slot and have it come out saying, 'This is Joe Smith."'

Analysts at the National Security Agency and CIA who are handling the tape are measuring two kinds of voice characteristics against previous recordings, experts said.

First, they measure the acoustics that give an idea of the physical features of a person's vocal tract - the shapes of the mouth, throat and nasal passages used in speech.

They also study the style of speech - the timing, speed and pitch - while looking for distinctive intonations, Berkovitz said.

Neither measurement is exact, but a close resemblance on both counts can give a good idea that the voice is a match.

"You can say with some probability, but you can never be sure," said Kenneth Stevens, a Massachusetts Institute of Technology expert in speech analysis and synthesis.

In the case of bin Laden, where convincing the world that he is alive might be important, a skilled impersonator using a low-quality recording could fool some analysts, Berkovitz said.

"If you have someone who can read like bin Laden, they can have him read it," he said. "People can change their voices so easily."

Analysts poring over bin Laden's voiceprint would be looking at computerized "wave forms" that plot a voice's pitch, speed and volume into dark and light shapes resembling a topographical map, said Brian Moncur, technology director at Fonix Corp. of Salt Lake City, which sells speech recognition and synthesis software.

For instance, the wave forms of the word "America" from one or more known bin Laden recordings might be overlaid atop the wave form of "America" from the current audiotape.

Problem is, the wave forms never match each other exactly.

"People hardly ever pronounce the same word the same way twice, even in the same utterance," Berkovitz said.

Language analysis software can handle recordings in Arabic.

Wave forms "are pretty much language-independent," Moncur said. "We all use the same organs to make the same speech. It's a mathematical representation."


By Jim Krane
  • Bootie Cosgrove-Mather

Comments