A robust compensation strategy for extraneous acoustic variations in spontaneous speech recognition

We propose a robust compensation strategy to deal effectively with extraneous acoustic variations for spontaneous speech recognition. This strategy extends speaker adaptive training, and uses hidden Markov models (HMM) parameter transformations to normalize the extraneous variations in the training data according to a set of predefined conditions. A "compact" model and the associated prior probability density functions (PDFs) of transformation parameters are estimated using the maximum likelihood criterion. In the testing phase, the generic model and the prior PDFs are used to search for the unknown word sequence based on Bayesian prediction classification (BPC). The proposed strategy is evaluated in the switchboard task, and is used to deal with three types of extraneous variations and mismatch in conversational speech recognition: pronunciation variations, inter-speaker variability, and telephone handset mismatch. Experimental results show that moderate word error rate reduction is achieved in comparison with a well-trained baseline HMM system under identical experimental conditions.

[1]  Keikichi Hirose,et al.  Robust speech recognition based on a Bayesian prediction approach , 1999, IEEE Trans. Speech Audio Process..

[2]  Li Deng,et al.  A statistical coarticulatory model for the hidden vocal-tract-resonance dynamics , 1999, EUROSPEECH.

[3]  Mazin G. Rahim,et al.  Integrated bias removal techniques for robust speech recognition , 1999, Comput. Speech Lang..

[4]  Chin-Hui Lee,et al.  Bayesian adaptive learning of the parameters of hidden Markov model for speech recognition , 1995, IEEE Trans. Speech Audio Process..

[5]  Chin-Hui Lee,et al.  Hierarchical stochastic feature matching for robust speech recognition , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[6]  Kuldip K. Paliwal,et al.  Automatic Speech and Speaker Recognition: Advanced Topics , 1999 .

[7]  Keikichi Hirose,et al.  Improving Viterbi Bayesian predictive classification via sequential bayesian learning in robust speech recognition , 1999, Speech Commun..

[8]  Mark J. F. Gales Cluster adaptive training for speech recognition , 1998, ICSLP.

[9]  M. Degroot Optimal Statistical Decisions , 1970 .

[10]  Don McAllaster,et al.  Fabricating conversational speech data with acoustic models: a program to examine model-data mismatch , 1998, ICSLP.

[11]  Keikichi Hirose,et al.  A minimax search algorithm for CDHMM based robust continuous speech recognition , 1998, ICSLP.

[12]  M. Finke,et al.  Pronunciation modelling for conversational speech recognition: a status report from WS97 , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[13]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[14]  Chin-Hui Lee,et al.  A maximum-likelihood approach to stochastic matching for robust speech recognition , 1996, IEEE Trans. Speech Audio Process..

[15]  Chin-Hui Lee,et al.  Robust speech recognition based on adaptive classification and decision strategies , 2000, Speech Commun..

[16]  Mark J. F. Gales Cluster adaptive training of hidden Markov models , 2000, IEEE Trans. Speech Audio Process..

[17]  Biing-Hwang Juang,et al.  Signal bias removal by maximum likelihood estimation for robust telephone speech recognition , 1996, IEEE Trans. Speech Audio Process..

[18]  S. J. Young,et al.  Tree-based state tying for high accuracy acoustic modelling , 1994 .

[19]  Kuldip K. Paliwal,et al.  Automatic Speech and Speaker Recognition , 1996 .

[20]  Keikichi Hirose,et al.  A minimax search algorithm for robust continuous speech recognition , 2000, IEEE Trans. Speech Audio Process..

[21]  Richard M. Schwartz,et al.  A compact model for speaker-adaptive training , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[22]  Bin Ma,et al.  Irrelevant variability normalization in learning HMM state tying from data based on phonetic decision-tree , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).