Speech recognition with matrix-MCE based two-dimension-cepstrum in cars

This study proposes matrix-MCE (MMCE) to reduce the influence of noises. Background noises usually degrade the performance of speech recognition. MMCE can efficiently minimize the classification error of two-dimension-cepstrum (TDC). Then the template matching employs the Gaussian-mixture-model (GMM). To evaluate the performance, the speech data used for our experiments are a set of isolated Mandarin digits. Experimental results indicate that MMCE-based TDC is very robust in the noisy environments.

[1]  Jeih-Weih Hung,et al.  Optimization of temporal filters for constructing robust features in speech recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Jeih-Weih Hung,et al.  Comparative analysis for data-driven temporal filters obtained via principal component analysis (PCA) and linear discriminant analysis (LDA) in speech recognition , 2001, INTERSPEECH.

[3]  Alain Biem,et al.  Minimum classification error training for online handwriting recognition , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Pong C. Yuen,et al.  Incremental Linear Discriminant Analysis for Face Recognition , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[5]  Dieter Geller,et al.  Improvements in connected digit recognition using linear discriminant analysis and mixture densities , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Wu Chou,et al.  Discriminative learning in sequential pattern recognition , 2008, IEEE Signal Processing Magazine.

[7]  Jing Peng,et al.  Discriminant Learning Analysis , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[8]  Sarel van Vuuren,et al.  Data based filter design for RASTA-like channel normalization in ASR , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[9]  Sarel van Vuuren,et al.  Data-driven design of RASTA-like filters , 1997, EUROSPEECH.

[10]  Jeih-Weih Hung,et al.  Data-driven temporal filters for robust features in speech recognition obtained via Minimum Classification Error (MCE) , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  M. L. Shire,et al.  Data-driven modulation filter design under adverse acoustic conditions and using phonetic and syllabic units , 1999, EUROSPEECH.

[12]  Chin-Teng Lin,et al.  GA-based noisy speech recognition using two-dimensional cepstrum , 2000, IEEE Trans. Speech Audio Process..