Strategies for improving audible quality and speech recognition accuracy of reverberant speech

We showed previously (Gillespie and Atlas, Int. Conf. on Acoustics, Speech, and Sig. Processing, 2002) that penalizing long-term reverberation energy is more effective than maximizing the signal-to-reverberation ratio (SRR) for improving audible quality and automatic speech recognition (ASR) accuracy. Using this knowledge, we propose a blind approach to speech dereverberation that reduces the length of the equalized speaker-to-receiver impulse response. The approach reduces the long-term correlation in the linear prediction (LP) residual of reverberant speech. We show that this approach improves both the audible quality (measured with subjective listening tests) and ASR accuracy (measured with two commercial ASR systems) of reverberant speech.

[1]  Henrique S. Malvar,et al.  Blind deconvolution of reverberated speech signals via regularization , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[2]  Les E. Atlas,et al.  Acoustic diversity for improved speech recognition in reverberant environments , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  John Mourjopoulos,et al.  Digital Equalization of Room Acoustics , 1994 .

[4]  Nelson Morgan,et al.  Perceptually inspired signal processing strategies for robust speech recognition in reverberant environments , 1998 .

[5]  J. Flanagan,et al.  Computer‐steered microphone arrays for sound transduction in large rooms , 1985 .

[6]  Wenqing Jiang,et al.  Adaptive Noise Reduction of Speech Signals , 2000 .

[7]  F. Itakura,et al.  Dereverberation of Speech Signals Based on Sub-Band Envelope Estimation , 1991 .

[8]  Henrique S. Malvar,et al.  Speech dereverberation via maximum-kurtosis subband adaptive filtering , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).