论文信息 - Experimental tools to evaluate intelligibility of text-to-speech (TTS) synthesis: effects of voice gender and signal quality

Experimental tools to evaluate intelligibility of text-to-speech (TTS) synthesis: effects of voice gender and signal quality

Two experiments are reported that constitute new methods for evaluation of text-to-speech (TTS) synthesis from the user’s perspective. Experiment 1, using sentence stimuli, and Experiment 2, using discrete word stimuli, investigate the effect of voice gender and signal quality on the intelligibility of three TTS synthesis systems from the user’s point of view. Accuracy scores and reaction time were recorded as on-line, implicit indices of intelligibility during phoneme detection tasks. It was hypothesized that male voice TTS would be more intelligible than female voice TTS, and that low quality signals would reduce intelligibility. Results indicate an interaction between voice gender and signal quality which is dependent on the TTS system. We suggest that intelligibility from the user’s perspective is modulated by several factors and there is a need to tailor systems to particular commercial applications. Methods to achieve commercially relevant evaluation of TTS synthesis are discussed.

[1] Cristina Delogu,et al. Cognitive factors in the evaluation of synthetic speech , 1998, Speech Commun..

[2] Jerome R. Bellegarda,et al. Improved duration modeling of English phonemes using a root sinusoidal transformation , 1998, ICSLP.

[3] Raymond D. Kent,et al. DECTalk and MacinTalk speech synthesizers: intelligibility differences for three listener groups. , 1998, Journal of speech, language, and hearing research : JSLHR.

[4] G D Allen,et al. Segmental intelligibility and speech interference thresholds of high-quality synthetic speech in presence of noise. , 1993, Journal of speech and hearing research.

[5] L L Elliott,et al. Development of a test of speech intelligibility in noise using sentence materials with controlled word predictability. , 1977, The Journal of the Acoustical Society of America.

[6] David B Pisoni,et al. Perception of synthetic speech produced automatically by rule: Intelligibility of eight text-to-speech systems , 1986, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[7] Martine Grice,et al. The SUS test: A method for the assessment of text-to-speech synthesis intelligibility using Semantically Unpredictable Sentences , 1996, Speech Commun..

[8] A. Cutler. Phoneme-monitoring reaction time as a function of preceding intonation contour , 1976 .

[9] Masanobu Abe,et al. Report on the Third ESCA TTS Workshop evaluation procedure , 1998, SSW.