Autonomous measurement of speech intelligibility utilizing automatic speech recognition

Measures of speech intelligibility are an essential tool for diagnosing hearing impairment and for tuning hearing aid parameters. This study explores the potential of automatic speech recognition (ASR) for conducting autonomous listening tests. In these tests (e.g., in the Oldenburg sentence matrix test employed here) the responses of participants are usually logged by a (human) supervisor. The target value is the speech reception threshold (SRT), i.e., the signal-to-noise ratio at which 50% speech intelligibility is achieved. We explore what ASR error rates can be obtained for such responses, and how ASR errors affect the measured SRT value. To this end, a speech database was recorded that contains utterances from 20 speakers and covers different levels of language complexity, ranging from simple five-word sentences to utterances as produced in typical human-human interactions during testing. While for the most complex speech material, the achievable SRT accuracy was not satisfactory, the ASR performance for sentences without out-of-vocabulary words was below 1.3% and hence sufficient to obtain a test-retest reliability of only 0.5 dB, which is identical to the reliability in human-supervised tests.

[1]  Luděk Müller,et al.  Comparison of various feature decorrelation techniques in automatic speech recognition , 2006 .

[2]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[3]  Birger Kollmeier,et al.  A Spanish matrix sentence test for assessing speech reception thresholds in noise , 2012, International journal of audiology.

[4]  Astrid van Wieringen,et al.  Development of a Dutch matrix sentence test to assess speech intelligibility in noise , 2014, International journal of audiology.

[5]  Birger Kollmeier,et al.  Efficient adaptive procedures for threshold and concurrent slope estimates for psychophysics and speech intelligibility tests. , 2002, The Journal of the Acoustical Society of America.

[6]  Kaisheng Yao,et al.  A basis method for robust estimation of constrained MLLR , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Stefan Goetze,et al.  Detection and Classification of Acoustic Events for In-Home Care , 2011 .

[8]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[9]  Anna Warzybok,et al.  Construction and first evaluation of the Italian Matrix Sentence Test for the assessment of speech intelligibility in noise , 2014 .

[10]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[11]  B Hagerman,et al.  Efficient adaptive methods for measuring speech reception threshold in quiet and in noise. , 1995, Scandinavian audiology.

[12]  Anna Warzybok,et al.  The multilingual matrix test: Principles, applications, and comparison across languages: A review , 2015, International journal of audiology.

[13]  M. Picheny,et al.  Comparison of Parametric Representation for Monosyllabic Word Recognition in Continuously Spoken Sentences , 2017 .

[14]  Luigi Ferrucci,et al.  Hearing loss prevalence in the United States. , 2011, Archives of internal medicine.