Subjective and objective measurement of synthesized speech intelligibility in modern telephone conditions

This paper investigates the impact of different telephone channels, represented by impairments as introduced by modern telecommunication networks (e.g. speech coding, bandwidth limitation, packet loss, etc.), on the intelligibility of synthesized speech. Both subjective and objective assessments are used. Two different speech intelligibility prediction models, namely PESQ Intelligibility and POLQA Intelligibility, are evaluated by comparing the predictions with subjectively obtained intelligibility scores. The results show that all the investigated degradations seriously impact the intelligibility of the synthesized speech measured subjectively. Furthermore it is shown that PESQ Intelligibility provides too low correlations between predicted objective measurements and subjective scores for accurate prediction of speech intelligibility while POLQA Intelligibility is capable of providing good intelligibility predictions in the case that a closed response experimental set up is used. © 2015 Elsevier B.V.

[1]  Luciano Nebbia,et al.  Comparison of natural and synthetic speech intelligibility for a reverse telephone directory service , 1992, ICSLP.

[2]  H.J.M. Steeneken,et al.  On measuring and predicting speech intelligibility , 1992 .

[3]  W. Voiers,et al.  Diagnostic acceptability measure for speech communication systems , 1977 .

[4]  A. M. Mimpen,et al.  Improving the reliability of testing the speech reception threshold for sentences. , 1979, Audiology : official organ of the International Society of Audiology.

[5]  Steven E. Stern,et al.  Social perception of male and female computer synthesized speech , 2003, Comput. Hum. Behav..

[6]  J. C. Steinberg,et al.  Factors Governing the Intelligibility of Speech Sounds , 1945 .

[7]  G. Fairbanks Test of Phonemic Differentiation: The Rhyme Test , 1958 .

[8]  Noam Chomsky,et al.  The Sound Pattern of English , 1968 .

[9]  John G. Beerends,et al.  Objective speech intelligibility measurement on the basis of natural speech in combination with perceptual modeling , 2009 .

[10]  David B. Pisoni,et al.  Intelligibility of normal speech I: Global and fine-grained acoustic-phonetic talker characteristics , 1996, Speech Commun..

[11]  Michael Keyhl,et al.  Perceptual Objective Listening Quality Assessment (POLQA), The Third Generation ITU-T Standard for End-to-End Speech Quality Measurement Part I-Temporal Alignment , 2013 .

[12]  Antony William Rix,et al.  Perceptual evaluation of speech quality (PESQ): The new ITU standard for end-to-end speech quality a , 2002 .

[13]  Darjaa Sakhia,et al.  Three Generations of Speech Synthesis Systems in Slovakia , 2006 .

[14]  T Houtgast,et al.  A physical method for measuring speech-transmission quality. , 1980, The Journal of the Acoustical Society of America.