Experimental tools to evaluate intelligibility of text-to-speech (TTS) synthesis: effects of voice gender and signal quality

Two experiments are reported that constitute new methods for evaluation of text-to-speech (TTS) synthesis from the user’s perspective. Experiment 1, using sentence stimuli, and Experiment 2, using discrete word stimuli, investigate the effect of voice gender and signal quality on the intelligibility of three TTS synthesis systems from the user’s point of view. Accuracy scores and reaction time were recorded as on-line, implicit indices of intelligibility during phoneme detection tasks. It was hypothesized that male voice TTS would be more intelligible than female voice TTS, and that low quality signals would reduce intelligibility. Results indicate an interaction between voice gender and signal quality which is dependent on the TTS system. We suggest that intelligibility from the user’s perspective is modulated by several factors and there is a need to tailor systems to particular commercial applications. Methods to achieve commercially relevant evaluation of TTS synthesis are discussed.