Adaptive log-spectral regression for in-car speech recognition using multiple distributed microphones

This letter addresses issues in improving hands-free speech recognition performance in different car environments. We propose a new speech-enhancement approach based on optimizing regression of the log-spectra, which is used to estimate the log-spectra of speech at a close-talking microphone by using multiple spatially distributed microphones. The regression weights can be adapted automatically for different noise environments. Compared to the nearest distant microphone and adaptive beamformer generalized sidelobe canceller (GSC), the proposed approach shows an advantage in the average relative word error rate (WER) reduction of 58.5 and 10.3%, respectively, for isolated word recognition under 15 real-car environments.

[1]  L. J. Griffiths,et al.  An alternative approach to linearly constrained adaptive beamforming , 1982 .

[2]  Kazuya Takeda,et al.  Multiple Regression of Log Spectra for In-Car Speech Recognition Using Multiple Distributed Microphones , 2005, IEICE Trans. Inf. Syst..

[3]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[4]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[5]  Michael S. Brandstein,et al.  Microphone Arrays - Signal Processing Techniques and Applications , 2001, Microphone Arrays.

[6]  Kazuya Takeda,et al.  Construction of speech corpus in moving car environment , 2000, INTERSPEECH.

[7]  John H. L. Hansen,et al.  CSA-BF: a constrained switched adaptive beamformer for speech enhancement and recognition in real car environments , 2003, IEEE Trans. Speech Audio Process..

[8]  Sridha Sridharan,et al.  Near-field Adaptive Beamformer for Robust Speech Recognition , 2002, Digit. Signal Process..

[9]  Kazuya Takeda,et al.  In-car speech recognition using distributed microphones: adapting to automatically detected driving conditions , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[10]  Juro Ohga,et al.  Adaptive microphone-array system for noise reduction , 1986, IEEE Trans. Acoust. Speech Signal Process..

[11]  Walter Kellermann,et al.  An Acoustic Human-Machine Front-End for Multimedia Applications , 2003, EURASIP J. Adv. Signal Process..

[12]  John H. L. Hansen,et al.  CSA-BF: novel constrained switched adaptive beamforming for speech enhancement & recognition in real car environments , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..