Duration-Distribution-Based HMM for Speech Recognition

To overcome the defects of the duration modeling in the homogeneous Hidden Markov Model (HMM) for speech recognition, a duration-distribution-based HMM (DDBHMM) is proposed in this paper based on a formalized definition of a left-to-right inhomogeneous Markov model. It has been demonstrated that it can be identically defined by either the state duration or the state transition probability. The speaker-independent continuous speech recognition experiments show that by only modeling the state duration in DDBHMM, a significant improvement (17.8% error rate reduction) can be achieved compared with the classical HMM. The ideal properties of DDBHMM give promise to many aspects of speech modeling, such as the modeling of the state duration, speed variation, speech discontinuity, and interframe correlation.

[1]  S. Vaseghi Hidden Markov models with duration-dependent state transition probabilities (speech recognition) , 1991 .

[2]  Neri Merhav,et al.  Hidden Markov modeling using the most likely state sequence , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[3]  S. V. Vaseghi State duration modelling in hidden Markov models , 1995, Signal Process..

[4]  David Burshtein,et al.  Robust parametric modeling of durations in hidden Markov models , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[5]  Stephen E. Levinson,et al.  Continuously variable duration hidden Markov models for automatic speech recognition , 1986 .

[6]  L. R. Rabiner,et al.  An introduction to the application of the theory of probabilistic functions of a Markov process to automatic speech recognition , 1983, The Bell System Technical Journal.

[7]  Jay G. Wilpon,et al.  Modeling state durations in hidden Markov models for automatic speech recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Xiao Xi Duration Distribution Based HMM Speech Recognition Models , 2004 .

[9]  R. Moore,et al.  Explicit modelling of state occupancy in hidden Markov models for automatic speech recognition , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  M. Savic,et al.  Use of semi-Markov models for speaker-independent phoneme recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  A. Cook,et al.  Experimental evaluation of duration modelling techniques for automatic speech recognition , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Mari Ostendorf,et al.  From HMM's to segment models: a unified view of stochastic modeling for speech recognition , 1996, IEEE Trans. Speech Audio Process..

[13]  Zhang Jia-lu A STUDY OF DURATION OF CHINESE CONSONANTS , 1982 .

[14]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[15]  Albino Nogueiras,et al.  Duration modeling with expanded HMM applied to speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[16]  Leah H. Jamieson,et al.  Modeling duration in a hidden Markov model with the exponential family , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[17]  Hsiao-Chuan Wang,et al.  Improvement of noisy speech recognition using a proportional alignment decoding algorithm in the training phase , 1998, Comput. Speech Lang..