A detailed perceptual evaluation of MITalk has been carried out to obtain measures of intelligibility and comprehension of the synthetic speech produced by the system. Phoneme recognition as measured by the Modified Rhyme Test showed an average error rate of only 6.9% overall. Correct word recognition assessed with a set of Harvard psychoacoustic sentences was 93.2% whereas performance with a set of semantically anomalous sentences developed at Haskins Laboratories was 78.7%. Comprehension performance on multiple-choice questions designed to assess understanding of continuous fluent synthetic speech showed good-to-excellent levels of performance and continued improvement over time. No major problems were encountered during the generation of the test materials used in this evaluation nor were any serious errors identified in the conceptual design of the MIT text-to-speech system. These perceptual results suggest that very high-quality and natural sounding synthetic speech can now be produced automatically from unrestricted English text and that such a text-to-speech system could well be implemented in applied settings such as devices for computer aided instruction or a reading machine for the blind in the very near future.
[1]
J. Allen,et al.
Synthesis of speech from unrestricted text
,
1976,
Proceedings of the IEEE.
[2]
J. P. Egan.
Articulation testing methods
,
1948,
The Laryngoscope.
[3]
K. D. Kryter,et al.
ARTICULATION-TESTING METHODS: CONSONANTAL DIFFERENTIATION WITH A CLOSED-RESPONSE SET.
,
1965,
The Journal of the Acoustical Society of America.
[4]
William D Marslen-Wilson,et al.
Processing interactions and lexical access during word recognition in continuous speech
,
1978,
Cognitive Psychology.
[5]
Rolf Carlson,et al.
MITalk‐79: The 1979 MIT text‐to‐speech system
,
1979
.
[6]
Dennis H. Klatt,et al.
Software for a cascade/parallel formant synthesizer
,
1980
.