Robust Feature Combination for Speech Recognition Using Linear Microphone Array in a Car

When speech recognition is performed in a car environment, there are two important robustness issues that should be taken into account. The first robustness is related to the noisy acoustic condition, and it has been one of the most popular research topics of in-vehicle speech recognition. In contrast, the second robustness, which is related to unstable calibration of the audio input, has not attracted much attention. Consequently, the performance of speech recognition would degrade greatly in a real application if the input device such as a microphone array is badly calibrated. We propose robust feature combination in the MFCC domain using speech inputs from a linear microphone array. It realizes robust (from both the noise and the calibration viewpoints) and practical speech recognition applications in car environments. Even a simple MFCC averaging approach is effective, and a new algorithm, hypothesis-based feature combination (HBFC), improves the performance. We also extend cepstral variance normalization as variance re-scaling, which makes the feature combination approach more robust. The advantages of the proposed algorithms are confirmed by the experiments using the data recorded in a moving car.

[1]  Yasunari Obuchi,et al.  Development and evaluation of speech database in automotive environments for practical speech recognition systems , 2006, INTERSPEECH.

[2]  Walter Kellermann A self-steering digital microphone array , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[3]  John S. D. Mason,et al.  On the limitations of cepstral features in noise , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Yasunari Obuchi,et al.  Hypothesis-based feature combination of multiple speech inputs for robust speech recognition in automotive environments , 2006, INTERSPEECH.

[5]  B. Atal Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. , 1974, The Journal of the Acoustical Society of America.