Classification of methods used for assessment of text-to-speech systems according to the demands placed on the listener

Abstract A classification of different methods used for the assessment of TTS (Text-To-Speech) systems, according to the demands placed on the listener, is proposed and discussed. The classification is made according to the four traditional scale levels: the Nominal, Ordinal, Interval and Ratio level. A fifth level, the Supra-Nominal, including memory processes, is proposed. The methods are divided into qualitative, non-metric methods and quantitative, metric methods. The outcome is that the highest metric assessment level (Ratio) is not necessarily the level that places the highest demands on the listener. Quite to the contrary, the Nominal level, supporting a non-metric qualitative approach, places even higher demands on the listener. Additionally, various factors affecting the outcome regardless of at what level the assessment takes place are discussed such as the number of source and speech content conditions, dynamic response range, subjects, training, degree of user involvement and listening level, in relation to ITU-TS and ITU-R recommendations.

[1]  S. S. Stevens,et al.  Psychophysics: Introduction to Its Perceptual, Neural and Social Prospects , 1975 .

[2]  Kim E. A. Silverman,et al.  Evaluating the overall comprehensibility of speech synthesizers , 1992, ICSLP.

[3]  Hideki Kasuya,et al.  Relationships between syllable, word and sentence intelligibilities of synthetic speech , 1992, ICSLP.

[4]  Robert A. Virzi,et al.  Refining the Test Phase of Usability Evaluation: How Many Subjects Is Enough? , 1992 .

[5]  J. P. Egan Articulation testing methods , 1948, The Laryngoscope.

[6]  D B Pisoni,et al.  Segmental intelligibility of synthetic speech produced by rule. , 1989, The Journal of the Acoustical Society of America.

[7]  Louis C. W. Pols,et al.  Evaluating text-to-speech systems: Some methodological aspects , 1990, Speech Commun..

[8]  G. Glass,et al.  Statistical methods in education and psychology , 1970 .

[9]  J. Mullennix,et al.  Some effects of talker variability on spoken word recognition. , 1989, The Journal of the Acoustical Society of America.

[10]  Paolino Usai,et al.  A subjective testing methodology for evaluating medium rate codecs for digital mobile radio applications , 1988, Speech Commun..

[11]  Ute Jekosch Speech quality assessment and evaluation , 1993, EUROSPEECH.

[12]  P. H. Lindsay,et al.  Human Information Processing: An Introduction to Psychology , 1972 .

[13]  Bronwyn L. Jones,et al.  Picture quality assessment: a comparison of ratio and ordinal scales , 1985 .

[14]  C V Pavlovic,et al.  Use of the magnitude estimation technique for assessing the performance of text-to-speech synthesis systems. , 1990, The Journal of the Acoustical Society of America.

[15]  Björn Lindström,et al.  Some aspects on context and response range effects when assessing naturalness of Swedish sentences generated by 4 synthesiser systems , 1992, ICSLP.

[16]  Cristina Delogu,et al.  Quality evaluation of text-to-speech synthesizers using magnitude estimation, categorical estimation, pair comparison and reaction time methods , 1991, EUROSPEECH.

[17]  Martine Grice,et al.  Multilingual synthesiser assessment using semantically unpredictable sentences , 1989, EUROSPEECH.