Model-based integration of reverberation for noise-adaptive near-end listening enhancement

Speech intelligibility is an important factor for successful speech communication in today’s society. So-called near-end listening enhancement (NELE) algorithms aim at improving speech intelligibility in conditions where the (clean) speech signal is accessible and can be modified prior to its presentation. However, many of these algorithms only consider the detrimental effect of noise and disregard the effect of reverberation. Therefore, in this paper we propose to additionally incorporate the detrimental effects of reverberation into noise-adaptive nearend listening enhancement algorithms. Based on the Speech Transmission Index (STI), which is widely used for speech intelligibility prediction, the effect of reverberation is effectively accounted for as an additional noise power term. This combined noise power term is used in a state-of-the-art noise-adaptive NELE algorithm. Simulations using two objective measures, the STI and the short-time objective intelligibility (STOI) measure demonstrate the potential of the proposed approach to improve the predicted speech intelligibility in noisy and reverberant conditions. Index Terms: speech-in-noise enhancement, reverberation, speech intelligibility, near-end listening enhancement

[1]  Jont B. Allen,et al.  Image method for efficiently simulating small‐room acoustics , 1976 .

[2]  Hiroshi Sato,et al.  Listening difficulty as a subjective measure for evaluation of speech transmission performance in public spaces. , 2004, The Journal of the Acoustical Society of America.

[3]  R. Niederjohn,et al.  The enhancement of speech intelligibility in high noise levels by high-pass filtering followed by rapid amplitude compression , 1976 .

[4]  Takayuki Arai,et al.  Modulation enhancement of speech by a pre-processing algorithm for improving intelligibility in reverberant environments , 2005, Speech Commun..

[5]  Alfred Mertins,et al.  Room Impulse Response Shortening/Reshaping With Infinity- and $p$ -Norm Optimization , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Birger Kollmeier,et al.  Listening effort and speech intelligibility in listening situations affected by noise and reverberation. , 2014, The Journal of the Acoustical Society of America.

[7]  R. H. Bernacki,et al.  Effects of noise on speech production: acoustic and perceptual analyses. , 1988, The Journal of the Acoustical Society of America.

[8]  Jesper Jensen,et al.  An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Richard C. Hendriks,et al.  Speech reinforcement with a globally optimized perceptual distortion measure for noisy reverberant channels , 2014, 2014 14th International Workshop on Acoustic Signal Enhancement (IWAENC).

[10]  Richard C. Hendriks,et al.  Optimizing Speech Intelligibility in a Noisy Environment: A unified view , 2015, IEEE Signal Processing Magazine.

[11]  Yannis Stylianou,et al.  Speech-in-noise intelligibility improvement based on spectral shaping and dynamic range compression , 2012, INTERSPEECH.

[12]  T. Houtgast,et al.  A review of the MTF concept in room acoustics and its use for estimating speech intelligibility in auditoria , 1985 .

[13]  Richard Heusdens,et al.  Speech energy redistribution for intelligibility improvement in noise based on a perceptual distortion measure , 2014, Comput. Speech Lang..

[14]  Jan Rennies,et al.  Improving speech intelligibility in noise by SII-dependent preprocessing using frequency-dependent amplification and dynamic range compression , 2013, INTERSPEECH.

[15]  Ulrike Goldschmidt Speech And Audio Processing In Adverse Environments , 2016 .

[16]  Cassia Valentini-Botinhao,et al.  Intelligibility-enhancing speech modifications: the hurricane challenge , 2020, INTERSPEECH.

[17]  Gerhard Schmidt,et al.  Speech and Audio Processing in Adverse Environments , 2008 .

[18]  Sanjit K. Mitra,et al.  Tree-structured complementary filter banks using all-pass sections , 1987 .

[19]  Richard C. Hendriks,et al.  Speech reinforcement in noisy reverberant environments using a perceptual distortion measure , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).