Towards a Better Understanding of the Effect of Reverberation on Speech Recognition Performance

In order to tailor dereverberation approaches to automatic speech recognition (ASR) systems, it is important to thoroughly understand the effect of reverberation on ASR performance. In this work, the effect is analyzed by varying the shape of the room impulse response (RIR) using two design parameters that are useful for defining the target response of typical dereverberation algorithms. The parameters determine the amount of attenuation of the coefficients that correspond to reflections arriving with at least a delay of T after the direct-path component. By convolving clean speech signals with the modified RIRs, ideal late-reverberation suppression is simulated. By varying the level of attenuation A and the delay T , the influence of these design parameters on the recognition rates is investigated. Thus, guidelines for adjusting dereverberation algorithms to ASR systems are deduced.

[1]  R. H. Bolt,et al.  Theory of Speech masking by reverberation , 1949 .

[2]  B. Atal Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. , 1974, The Journal of the Acoustical Society of America.

[3]  R. G. Leonard,et al.  A database for speaker-independent digit recognition , 1984, ICASSP.

[4]  Masato Miyoshi,et al.  Inverse filtering of room acoustics , 1988, IEEE Trans. Acoust. Speech Signal Process..

[5]  Brian Kingsbury,et al.  Recognizing reverberant speech with RASTA-PLP , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Satoshi Nakamura,et al.  Sound Scene Database in Real Acoustical Environments, Proc. First International Workshop on East-Asian Language Resource and Evaluation , 1998 .

[7]  Li Deng,et al.  Large-vocabulary speech recognition under adverse acoustic environments , 2000, INTERSPEECH.

[8]  J.-M. Boucher,et al.  A New Method Based on Spectral Subtraction for Speech Dereverberation , 2001 .

[9]  Richard M. Stern,et al.  Microphone array processing for robust speech recognition , 2003 .

[10]  Eap Emanuël Habets Single- and multi-microphone speech dereverberation using spectral enhancement , 2007 .

[11]  Rüdiger Hoffmann,et al.  The harming part of room acoustics in automatic speech recognition , 2007, INTERSPEECH.

[12]  Biing-Hwang Juang,et al.  Speech Dereverberation Based on Maximum-Likelihood Estimation With Time-Varying Gaussian Source Model , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Walter Kellermann,et al.  Towards Robust Distant-Talking Automatic Speech Recognition in Reverberant Environments , 2008 .

[14]  Emanuel A. P. Habets,et al.  Late Reverberant Spectral Variance Estimation Based on a Statistical Model , 2009, IEEE Signal Processing Letters.

[15]  Matthias Wölfel,et al.  Enhanced Speech Features by Single-Channel Joint Compensation of Noise and Reverberation , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  Tomohiro Nakatani,et al.  Suppression of Late Reverberation Effect on Speech Signal Using Long-Term Multiple-step Linear Prediction , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  Reinhold Häb-Umbach,et al.  Model based feature enhancement for automatic speech recognition in reverberant environments , 2009, INTERSPEECH.

[18]  Roland Maas,et al.  Reverberation Model-Based Decoding in the Logmelspec Domain for Robust Distant-Talking Speech Recognition , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[19]  Walter Kellermann,et al.  TRINICON for Dereverberation of Speech and Audio Signals , 2010, Speech Dereverberation.