Improved speech modelling and recognition using a new training algorithm based on outlier-emphasis for non-stationary state HMM

In this study, we develop a modified maximum likelihood algorithm for optimally estimating the state-dependent polynomial parameters in the nonstationary-state HMM. The newly devised training method controls the influence of outliers in the training data on the constructed models. For an alphabet recognition task, outlier emphasis resulted in improved performance. An error rate reduction of 14% is achieved for the linear trend and 7.5% is obtained for the stationary-state HMMs over the conventional models trained by the Viterbi algorithm based on the joint-state maximum likelihood criterion. The properties of the nonstationary-state HMM trained with the proposed approach are analysed by examining goodness-of-fit of the real speech data to the polynomial trajectories in the model.

[1]  S. Young,et al.  Lattice-based discriminative training for large vocabulary speech recognition , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[2]  Herbert Gish,et al.  Parametric trajectory models for speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[3]  M.J. Russell,et al.  Linear trajectory segmental HMMs , 1997, IEEE Signal Processing Letters.

[4]  Lalit R. Bahl,et al.  Maximum mutual information estimation of hidden Markov model parameters for speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Li Deng,et al.  Use of generalized dynamic feature parameters for speech recognition , 1997, IEEE Trans. Speech Audio Process..

[6]  Xiaodong Sun,et al.  Speech recognition using hidden Markov models with polynomial regression functions as nonstationary states , 1994, IEEE Trans. Speech Audio Process..

[7]  Kuldip K. Paliwal,et al.  Model parameter estimation for mixture density polynomial segment models , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Patrick Haffner,et al.  Connectionist speech recognition with a global MMI algorithm , 1993, EUROSPEECH.

[9]  Harvey F. Silverman,et al.  Neural networks, maximum mutual information training, and maximum likelihood training (speech recognition) , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[10]  Frank K. Soong,et al.  An N-best candidates-based discriminative training for speech recognition applications , 1994, IEEE Trans. Speech Audio Process..

[11]  Li Deng,et al.  Speaker-independent phonetic classification using hidden Markov models with mixtures of trend functions , 1997, IEEE Trans. Speech Audio Process..

[12]  Mari Ostendorf,et al.  From HMMS to Segment Models: Stochastic Modeling for CSR , 1996 .

[13]  Shun-ichi Amari,et al.  A Theory of Adaptive Pattern Classifiers , 1967, IEEE Trans. Electron. Comput..

[14]  Yariv Ephraim,et al.  Estimation of hidden Markov model parameters by minimizing empirical error rate , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[15]  Li Deng,et al.  The trended HMM with discriminative training for phonetic classification , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[16]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[17]  Michael Picheny,et al.  On a model-robust training method for speech recognition , 1988, IEEE Trans. Acoust. Speech Signal Process..

[18]  Biing-Hwang Juang,et al.  A Minimum Error Rate Pattern Recognition Approach to Speech Recognition , 1994, Int. J. Pattern Recognit. Artif. Intell..

[19]  John H. L. Hansen,et al.  Improved HMM training and scoring strategies with application to accent classification , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[20]  H. Gish A minimum classification error, maximum likelihood, neural network , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[21]  Haizhou Li,et al.  On MMI learning of Gaussian mixture for speaker models , 1995, EUROSPEECH.

[22]  Kuldip K. Paliwal,et al.  Automatic Speech and Speaker Recognition: Advanced Topics , 1999 .

[23]  Yves Normandin,et al.  Hidden Markov models, maximum mutual information estimation, and the speech recognition problem , 1992 .

[24]  Li Deng,et al.  Speaker adaptation experiments using nonstationary-state hidden Markov models: a MAP approach , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[25]  Biing-Hwang Juang,et al.  New discriminative training algorithms based on the generalized probabilistic descent method , 1991, Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop.

[26]  Li Deng,et al.  Speaker-independent phonetic classification using hidden Markov models with state-conditioned mixtures of trend functions , 1997 .