On-Line Feature and Acoustic Model Space Compensation for Robust Speech Recognition in Car Environment

In order to develop a robust man-machine interface based on speech for cars, the speaker variability and the acoustic environment effects have to be compensated. In this work, an on-line feature and acoustic model compensation (MATE-MEMLIN) is proposed to compensate the speaker variability and the acoustic car environment. MATE-MEMLIN consists on the combination of the techniques augMented stAte space acousTic modEl (MATE) and Multi-Environment Model based Linear Normalization (MEMLIN). MATE defines expanded acoustic models to compensate the speaker frequency variability using data driven estimated linear transformations. On the other hand, MEMLIN, an empirical feature vector normalization technique, was also presented and it was proved to be effective to compensate environment mismatch. Some experiments with Spanish SpeechDat Car database were carried out in order to study the performance of the proposed technique in a real car environment, reaching an important mean improvement in Word Error Rate, WER.

[1]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[2]  Khalid Choukri,et al.  SPEECHDAT-CAR. A Large Speech Database for Automotive Environments , 2000, LREC.

[3]  Antonio Miguel,et al.  On the Interaction Between Speaker Normalization, Environment Compensation, and Discriminant Feature Space Transformations , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[4]  Hermann Ney,et al.  Vocal tract normalization equals linear transformation in cepstral space , 2001, IEEE Transactions on Speech and Audio Processing.

[5]  Li Deng,et al.  Evaluation of the SPLICE algorithm on the Aurora2 database , 2001, INTERSPEECH.

[6]  Li Lee,et al.  A frequency warping approach to speaker normalization , 1998, IEEE Trans. Speech Audio Process..

[7]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[8]  Eduardo Lleida,et al.  Augmented state space acoustic decoding for modeling local variability in speech , 2005, INTERSPEECH.

[9]  E. Lleida,et al.  Recent advances in PD-MEMLIN for speech recognition in car conditions , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[10]  Eduardo Lleida,et al.  Multi-environment models based linear normalization for speech recognition in car conditions , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Alfons Juan-Císcar,et al.  Local transformation models for speech recognition , 2006, INTERSPEECH.