Clap your hands! Calibrating spectral subtraction for dereverberation

Reverberation effects as observed by room microphones severely degrade the performance of automatic speech recognition systems. We investigate the use of dereverberation by spectral subtraction as proposed by Lebart and Boucher and introduce a simple approach to estimate the required decay parameter by clapping hands. Experiments on small vocabulary continuous speech recognition task on read speech show that using the calibrated dereverberation improves WER from 73.2 to 54.7 for the best microphone. In combination with system adaptation, the WER could be reduced to 28.2, which is only a 16% relative loss of performance comparison to using a headset instead of a room microphone.

[1]  J. Boucher,et al.  A New Method Based on Spectral Subtraction for Speech , 2001 .

[2]  Mark J. F. Gales,et al.  Mean and variance adaptation within the MLLR framework , 1996, Comput. Speech Lang..

[3]  W. Kellermann,et al.  Model-based dereverberation of speech in the mel-spectral domain , 2008, 2008 42nd Asilomar Conference on Signals, Systems and Computers.

[4]  Peter Vary,et al.  A binaural room impulse response database for the evaluation of dereverberation algorithms , 2009, 2009 16th International Conference on Digital Signal Processing.

[5]  E.A.P. Habets,et al.  Single-Channel Speech Dereverberation based on Spectral Subtraction , 2004 .

[6]  Jay G. Wilpon,et al.  A study of speech recognition for children and the elderly , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[7]  Georg Stemmer Modeling variability in speech recognition , 2004 .

[8]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[9]  Rüdiger Hoffmann,et al.  A new feature analysis method for robust ASR in reverberant environments based on the harmonic structure of speech , 2008, 2008 16th European Signal Processing Conference.

[10]  Wolfgang Wahlster,et al.  Verbmobil: Foundations of Speech-to-Speech Translation , 2000, Artificial Intelligence.