Use of generalized dynamic feature parameters for speech recognition

In this study, a new hidden Markov model that integrates generalized dynamic feature parameters into the model structure is developed and evaluated using maximum-likelihood (ML) and minimum-classification-error (MCE) pattern recognition approaches. In addition to the motivation of direct minimization of error rate, the MCE approach automatically eliminates the necessity of artificial constraints, which were essential for the model formulation based on the ML approach, on the weighting functions in the definition of the generalized dynamic parameters. We design the loss function for minimizing error rate specifically for the new model, and derive an analytical form of the gradient of the loss function that enables the implementation of the MCE approach. The convergence property of the training procedure based on the MCE approach is investigated, and the experimental results from a standard TIMIT phonetic classification task demonstrate a 13.4% error rate reduction compared with the ML approach.

[1]  Shun-ichi Amari,et al.  A Theory of Adaptive Pattern Classifiers , 1967, IEEE Trans. Electron. Comput..

[2]  Dennis J. Clague,et al.  New Classes of Synchronous Codes , 1967, IEEE Trans. Electron. Comput..

[3]  L. Baum,et al.  An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[4]  Lalit R. Bahl,et al.  A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Sadaoki Furui,et al.  Speaker-independent isolated word recognition using dynamic features of speech spectrum , 1986, IEEE Trans. Acoust. Speech Signal Process..

[6]  Vishwa Gupta,et al.  Integration of acoustic information in a large vocabulary word recognizer , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[8]  Li Deng,et al.  Large vocabulary word recognition using context-dependent allophonic hidden Markov models☆ , 1990 .

[9]  Chin-Hui Lee,et al.  Acoustic modeling for large vocabulary speech recognition , 1990 .

[10]  Kai-Fu Lee,et al.  Context-independent phonetic hidden Markov models for speaker-independent continuous speech recognition , 1990 .

[11]  Biing-Hwang Juang,et al.  Discriminative multi-layer feed-forward networks , 1991, Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop.

[12]  Patrick Kenny,et al.  Phonemic hidden Markov models with continuous mixture output densities for large vocabulary word recognition , 1991, IEEE Trans. Signal Process..

[13]  Brian Hanson,et al.  Regression features for recognition of speech in quiet and in noise , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[14]  Biing-Hwang Juang,et al.  Discriminative template training for dynamic programming speech recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[15]  Biing-Hwang Juang,et al.  Discriminative learning for minimum error classification [pattern recognition] , 1992, IEEE Trans. Signal Process..

[16]  James R. Glass,et al.  A comparative study of signal representations and classification techniques for speech recognition , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[17]  L. Deng,et al.  CONTEXT-DEPENDENT MARKOV MODEL STRUCTURED BY LOCUS EQUATIONS : APPLICATIONS TO PHONETIC CLASSIFICATION , 1994 .

[18]  Biing-Hwang Juang,et al.  A Minimum Error Rate Pattern Recognition Approach to Speech Recognition , 1994, Int. J. Pattern Recognit. Artif. Intell..

[19]  Li Deng Integrated optimization of dynamic feature parameters for hidden Markov modeling of speech , 1994, IEEE Signal Process. Lett..

[20]  Steve J. Young,et al.  Large vocabulary continuous speech recognition using HTK , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[21]  Andrej Ljolje,et al.  High accuracy phone recognition using context clustering and quasi-triphonic models , 1994, Comput. Speech Lang..

[22]  Li Deng,et al.  Use of generalized dynamic feature parameters for speech recognition: maximum likelihood and minimum classification error approaches , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[23]  B. Juang,et al.  Context-dependent Phonetic Hidden Markov Models for Speaker-independent Continuous Speech Recognition , 2008 .