Speech Quality Assessment

This chapter provides an overview of the various methods and techniques used for assessment of speech quality. A summary is given of some of the most commonly used listening tests designed to obtain reliable ratings of the quality of processed speech from human listeners. Considerations for conducting successful subjective listening tests are given along with cautions that need to be exercised. While the listening tests are considered the gold standard in terms of assessment of speech quality, they can be costly and time consuming. For that reason, much research effort has been placed on devising objective measures that correlate highly with subjective rating scores. An overview of some of the most commonly used objective measures is provided along with a discussion on how well they correlate with subjective listening tests.

[1]  Tongxing Lu,et al.  Solution of the matrix equation AX−XB=C , 2005, Computing.

[2]  J M Kates,et al.  On using coherence to measure distortion in hearing aids. , 1992, The Journal of the Acoustical Society of America.

[3]  J. Fleiss,et al.  Intraclass correlations: uses in assessing rater reliability. , 1979, Psychological bulletin.

[4]  M. O. Hawksford,et al.  Error activity and error entropy as a measure of psychoacoustic significance in the perceptual domain , 1994 .

[5]  W. Hoeffding,et al.  Rank Correlation Methods , 1949 .

[6]  B. Moore An introduction to the psychology of hearing, 3rd ed. , 1989 .

[7]  Thomas P. Barnwell,et al.  Segmental preclassification for improved objective speech quality measures , 1981, ICASSP.

[8]  Yi Hu,et al.  Subjective Comparison of Speech Enhancement Algorithms , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[9]  William M. Hartmann,et al.  Psychoacoustics: Facts and Models , 2001 .

[10]  I.V. McLoughlin,et al.  A Methodology for Improving PESQ accuracy for Chinese Speech , 2005, TENCON 2005 - 2005 IEEE Region 10 Conference.

[11]  Ronald E. Crochiere,et al.  A study of complexity and quality of speech waveform coders , 1978, ICASSP.

[12]  Wonho Yang,et al.  Enhanced modified bark spectral distortion (embsd): an objective speech quality measure based on audible distortion and cognition model , 1999 .

[13]  Vijay Parsa,et al.  Loudness pattern-based speech quality evaluation using bayesian modeling and Markov chain Monte Carlo methods. , 2007, The Journal of the Acoustical Society of America.

[14]  Abdulhussain E. Mahdi,et al.  Output-based objective speech quality measure using self-organizing map , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[15]  J Kreiman,et al.  Validity of rating scale measures of voice quality. , 1998, The Journal of the Acoustical Society of America.

[16]  R. Kubichek,et al.  Advances in objective voice quality assessment , 1991, IEEE Global Telecommunications Conference GLOBECOM '91: Countdown to the New Millennium. Conference Record.

[17]  William D. Voiers,et al.  Evaluating the Effects of Noise on Voice Communication Systems , 2002 .

[18]  Matti Karjalainen Sound quality measurements of audio systems based on models of auditory perception , 1984, ICASSP.

[19]  K. D. Kryter Errata: Method for the Calculation and Use of the Articulation Index [J. Acoust. Soc. Am. 34, 1689–1697 (1962)] , 1964 .

[20]  Thomas P. Barnwell,et al.  Objective measures for speech quality testing , 1978 .

[21]  Schuyler Quackenbush,et al.  Objective measures of speech quality , 1995 .

[22]  Yohtaro Yatsuzuka,et al.  A 32 kbit/s ADPCM algorithm having high performance for both voice and 9.6 kbit/s modem signals , 1988, IEEE J. Sel. Areas Commun..

[23]  Yi Hu,et al.  Objective measures for predicting speech intelligibility in noisy conditions based on new band-importance functions. , 2009, The Journal of the Acoustical Society of America.

[24]  Doh-Suk Kim,et al.  Perceptual model for non-intrusive speech quality assessment , 2004, ICASSP.

[25]  Stephen D. Voran,et al.  Objective estimation of perceived speech quality. I. Development of the measuring normalizing block technique , 1999, IEEE Trans. Speech Audio Process..

[26]  Richard M. Schwartz,et al.  Enhancement of speech corrupted by acoustic noise , 1979, ICASSP.

[27]  James M Kates,et al.  Coherence and the speech intelligibility index. , 2004, The Journal of the Acoustical Society of America.

[28]  Andries P. Hekstra,et al.  Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[29]  B. Moore An Introduction to the Psychology of Hearing , 1977 .

[30]  Ville-Veikko Mattila Objective measures for the characterization of the basic functioning of noise suppression algorithms , 2003 .

[31]  Antony W. Rix,et al.  Perceptual speech quality assessment - a review , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[32]  L. Cronbach Coefficient alpha and the internal structure of tests , 1951 .

[33]  Bob Novorita,et al.  Incorporation of temporal masking effects into bark spectral distortion measure , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[34]  John G. Beerends,et al.  A Perceptual Audio Quality Measure Based on a Psychoacoustic Sound Representation , 1992 .

[35]  D. Weiss,et al.  Interrater reliability and agreement of subjective judgments , 1975 .

[36]  B. Lindblom,et al.  Modeling the judgment of vowel quality differences. , 1981, The Journal of the Acoustical Society of America.

[37]  W. Bastiaan Kleijn,et al.  Speech Quality Assessment , 2008 .

[38]  John H. L. Hansen,et al.  An effective quality evaluation protocol for speech enhancement algorithms , 1998, ICSLP.

[39]  Philipos C. Loizou,et al.  Speech Enhancement: Theory and Practice , 2007 .

[40]  Richard H. Bartels,et al.  Algorithm 432 [C2]: Solution of the matrix equation AX + XB = C [F4] , 1972, Commun. ACM.

[41]  K. McGraw,et al.  Forming inferences about some intraclass correlation coefficients. , 1996 .

[42]  Mike P. Hollier,et al.  Non-intrusive speech-quality assessment using vocal-tract models , 2000 .

[43]  Yi Hu,et al.  Evaluation of Objective Quality Measures for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[44]  J. Freidman,et al.  Multivariate adaptive regression splines , 1991 .

[45]  Robert E. Yantorno,et al.  Performance of the modified Bark spectral distortion as an objective speech quality measure , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[46]  Hugo Fastl,et al.  Psychoacoustics Facts and Models. 2nd updated edition , 1999 .

[47]  R. Lyman Ott.,et al.  An introduction to statistical methods and data analysis , 1977 .

[48]  D. W. Robinson,et al.  A re-determination of the equal-loudness relations for pure tones , 1956 .

[49]  J Kreiman,et al.  Comparing internal and external standards in voice quality judgments. , 1993, Journal of speech and hearing research.

[50]  Jian Kang Comparison of speech intelligibility between English and Chinese , 1998 .

[51]  Ernst H. Rothauser,et al.  A Comparison of Preference Measurement Methods , 1971 .

[52]  R.F. Kubichek,et al.  Speech quality assessment using expert pattern recognition , 1989, Conference Proceeding IEEE Pacific Rim Conference on Communications, Computers and Signal Processing.

[53]  Nobuhiko Kitawaki,et al.  Objective quality evaluation for low-bit-rate speech coding systems , 1988, IEEE J. Sel. Areas Commun..

[54]  Robert E. Yantorno,et al.  Improvement of MBSD by scaling noise masking threshold and correlation analysis with MOS difference instead of MOS , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[55]  W. Voiers,et al.  Diagnostic acceptability measure for speech communication systems , 1977 .

[56]  K. D. Kryter Methods for the Calculation and Use of the Articulation Index , 1962 .

[57]  Liu Jianjun,et al.  Intrusive speech transmission quality measurement in chinese environment , 2007, 2007 6th International Conference on Information, Communications & Signal Processing.

[58]  Yi Hu,et al.  Subjective comparison and evaluation of speech enhancement algorithms , 2007, Speech Commun..

[59]  J. Flanagan A Difference Limen for Vowel Formant Frequency , 1955 .

[60]  S. Blumstein,et al.  Invariant cues for place of articulation in stop consonants. , 1978, The Journal of the Acoustical Society of America.

[61]  Robert F. Kubichek,et al.  Vector quantization techniques for output-based objective speech quality , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[62]  Matti Karjalainen,et al.  A new auditory model for the evaluation of sound quality of audio systems , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[63]  A.W. Rix,et al.  The perceptual analysis measurement system for robust end-to-end speech quality assessment , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[64]  David J. Goodman,et al.  Subjective quality of the same speech transmission conditions in seven different countries , 1982, ICASSP.

[65]  Raymond D. Kent,et al.  Acoustic Analysis of Speech , 2009 .

[66]  John Makhoul,et al.  Towards perceptually consistent measures of spectral distance , 1976, ICASSP.

[67]  Dennis H. Klatt,et al.  Prediction of perceived phonetic distance from critical-band spectra: A first step , 1982, ICASSP.

[68]  Andrew Sekey,et al.  An Objective Measure for Predicting Subjective Quality of Speech Coders , 1992, IEEE J. Sel. Areas Commun..

[69]  W. A. Mvnso,et al.  Loudness , Its Definition , Measurement and Calculation , 2004 .

[70]  Vijay Parsa,et al.  Nonintrusive speech quality evaluation using an adaptive neurofuzzy inference system , 2005, IEEE Signal Processing Letters.

[71]  S. Dimolitsas,et al.  Objective speech distortion measures and their relevance to speech quality assessments , 1989 .

[72]  Vijay Parsa,et al.  On the use of Bayesian modeling for predicting noise reduction performance , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[73]  IEEE Recommended Practice for Speech Quality Measurements , 1969, IEEE Transactions on Audio and Electroacoustics.

[74]  H K Suen,et al.  Agreement, reliability, accuracy, and validity: Toward a clarification. , 1988 .

[75]  M.G. Bellanger,et al.  Digital processing of speech signals , 1980, Proceedings of the IEEE.

[76]  R. Forthofer,et al.  Rank Correlation Methods , 1981 .

[77]  J. Kreiman,et al.  Perceptual evaluation of voice quality: review, tutorial, and a framework for future research. , 1993, Journal of speech and hearing research.

[78]  Tiago H. Falk,et al.  Single-Ended Speech Quality Measurement Using Machine Learning Methods , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[79]  Paolino Usai,et al.  A subjective testing methodology for evaluating medium rate codecs for digital mobile radio applications , 1988, Speech Commun..

[80]  James M Kates,et al.  Effects of noise and distortion on speech quality judgments in normal-hearing and hearing-impaired listeners. , 2007, The Journal of the Acoustical Society of America.

[81]  Rudolf Mester,et al.  Spectral Entropy-Activity Classification in Adaptive Transform Coding , 1992, IEEE J. Sel. Areas Commun..