An intelligibility metric based on a simple model of speech communication

Instrumental measures of speech intelligibility typically produce an index between 0 and 1 that is monotonically related to listening test scores. As such, these measures are dimensionless and do not represent physical quantities. In this paper, we propose a new instrumental intelligibility metric that describes speech intelligibility using bits per second. The proposed metric builds upon an existing intelligibility metric that was motivated by information theory. Our main contribution is that we use a statistical model of speech communication that accounts for noise inherent in the speech production process. Experiments show that the proposed metric performs at least as well as existing state-of-the-art intelligibility metrics.

[1]  Richard C. Hendriks,et al.  A Simple Model of Speech Communication and its Application to Intelligibility Enhancement , 2015, IEEE Signal Processing Letters.

[2]  K. Wagener,et al.  Design, optimization and evaluation of a Danish sentence test in noise: Diseño, optimización y evaluación de la prueba Danesa de frases en ruido , 2003, International journal of audiology.

[3]  Rainer Martin,et al.  Objective Intelligibility Measures Based on Mutual Information for Speech Subjected to Speech Enhancement Processing , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[4]  James M Kates,et al.  Coherence and the speech intelligibility index. , 2004, The Journal of the Acoustical Society of America.

[5]  Deliang Wang,et al.  Role of mask pattern in intelligibility of ideal binary-masked noisy speech. , 2009, The Journal of the Acoustical Society of America.

[6]  Jont B. Allen,et al.  The Articulation Index is a Shannon channel capacity , 2005 .

[7]  Malcolm Slaney,et al.  An Efficient Implementation of the Patterson-Holdsworth Auditory Filter Bank , 1997 .

[8]  Jesper Jensen,et al.  Speech Intelligibility Prediction Based on Mutual Information , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[9]  J. Flanagan Speech Analysis, Synthesis and Perception , 1971 .

[10]  Frédéric E. Theunissen,et al.  The Modulation Transfer Function for Speech Intelligibility , 2009, PLoS Comput. Biol..

[11]  Philipos C. Loizou,et al.  Speech Enhancement: Theory and Practice , 2007 .

[12]  R. M. Fano,et al.  The Information Theory Point of View in Speech Communication , 1950 .

[13]  W. Marsden I and J , 2012 .

[14]  Mike Brookes,et al.  A weighted STOI intelligibility metric based on mutual information , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Jesper Jensen,et al.  An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[17]  Richard C. Hendriks,et al.  Optimizing Speech Intelligibility in a Noisy Environment: A unified view , 2015, IEEE Signal Processing Magazine.

[18]  Torsten Dau,et al.  Prediction of speech intelligibility based on an auditory preprocessing model , 2010, Speech Commun..

[19]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[20]  Meinard Müller,et al.  Information retrieval for music and motion , 2007 .

[21]  Juraj Simko,et al.  The CHAINS corpus: CHAracterizing INdividual Speakers , 2006 .

[22]  M. Kendall A NEW MEASURE OF RANK CORRELATION , 1938 .

[23]  Alan Bundy,et al.  Dynamic Time Warping , 1984 .

[24]  K. S. Rhebergen,et al.  Extended speech intelligibility index for the prediction of the speech reception threshold in fluctuating noise. , 2006, The Journal of the Acoustical Society of America.

[25]  Raymond L. Goldsworthy,et al.  Analysis of speech-based Speech Transmission Index methods with implications for nonlinear operations. , 2004, The Journal of the Acoustical Society of America.

[26]  DeLiang Wang,et al.  Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation. , 2006, The Journal of the Acoustical Society of America.

[27]  R. Plomp,et al.  Effect of temporal envelope smearing on speech reception. , 1994, The Journal of the Acoustical Society of America.

[28]  Jim Euchner Design , 2014, Catalysis from A to Z.

[29]  Torsten Dau,et al.  Predicting speech intelligibility based on the signal-to-noise envelope power ratio after modulation-frequency selective processing. , 2011, The Journal of the Acoustical Society of America.

[30]  T. Houtgast,et al.  The Modulation Transfer Function in Room Acoustics as a Predictor of Speech Intelligibility , 1973 .

[31]  J. C. Steinberg,et al.  Factors Governing the Intelligibility of Speech Sounds , 1945 .

[32]  Tao Chen,et al.  Analysis of Speaker Variability , 2022 .

[33]  Martin Cooke,et al.  A glimpsing model of speech perception in noise. , 2006, The Journal of the Acoustical Society of America.