Native and non-native speaker judgements on the quality of synthesized speech

The difference between native speakers’ and non-native speakers’ naturalness judgements of synthetic speech is investigated. Similar/difference judgements are analysed via a multidimensional scaling analysis and compared to Mean opinion scores. It is shown that although the two groups generally behave in a similar manner the variance of non-native speaker judgements is generally higher. While both groups of subject can clearly distinguish natural speech from the best synthetic examples, the groups’ responses to different artefacts present in the synthetic speech can vary. Index Terms: speech synthesis, evaluation, non-native

[1]  Joseph P. Olive,et al.  Text-to-speech synthesis , 1995, AT&T Technical Journal.

[2]  Wendy J. Holmes,et al.  Speech Synthesis and Recognition , 1988 .

[3]  T. Houtgast,et al.  Quantifying the intelligibility of speech in noise for non-native listeners. , 2002, The Journal of the Acoustical Society of America.

[4]  Kathryn Drager,et al.  Synthesized speech intelligibility in sentences: a comparison of monolingual English-speaking and bilingual children. , 2005, Language, speech, and hearing services in schools.

[5]  Simon King,et al.  Multidimensional scaling of listener responses to synthetic speech , 2005, INTERSPEECH.

[6]  J L Hall Application of multidimensional scaling to subjective evaluation of coded speech. , 2001, The Journal of the Acoustical Society of America.

[7]  Christina L. Bennett Large scale evaluation of corpus-based synthesizers: results and lessons from the blizzard challenge 2005 , 2005, INTERSPEECH.

[8]  Patrick J. F. Groenen,et al.  Modern Multidimensional Scaling: Theory and Applications , 2003 .

[9]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[10]  Mahesh Viswanathan,et al.  Measuring speech quality for text-to-speech systems: development and assessment of a modified mean opinion score (MOS) scale , 2005, Comput. Speech Lang..

[11]  Andy P. Field,et al.  Discovering Statistics Using SPSS , 2000 .