Robust parametric modeling of durations in hidden Markov models

We address the problem of explicit state and word duration modeling in hidden Markov models (HMMs). A major weakness of conventional HMMs is that they implicitly model state durations by a geometric distribution, which is usually inappropriate. Using explicit modeling of state and word durations, it is possible to significantly enhance the performance of speech recognition systems. The main outcome of this work is a modified Viterbi algorithm that by incorporating both state and word duration modeling, reduces the string error rate of the conventional Viterbi algorithm by 29% and 43% for known and unknown string lengths respectively, for a speaker independent, connected digit string task. The uniqueness of the algorithm is that unlike alternative approaches, it adds the duration metric at each frame transition (and not at the end of a state, word or sentence), thus enhancing the performance.

[1]  Raj Reddy,et al.  Large-vocabulary speaker-independent continuous speech recognition: the sphinx system , 1988 .

[2]  R. G. Leonard,et al.  A database for speaker-independent digit recognition , 1984, ICASSP.

[3]  Frank K. Soong,et al.  High performance connected digit recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[4]  C. R. Rao,et al.  Linear Statistical Inference and its Applications , 1968 .

[5]  Calyampudi Radhakrishna Rao,et al.  Linear Statistical Inference and its Applications , 1967 .

[6]  X. D. Huang,et al.  Phoneme classification using semicontinuous hidden Markov models , 1992, IEEE Trans. Signal Process..

[7]  Jerome R. Bellegarda,et al.  Tied mixture continuous parameter modeling for speech recognition , 1990, IEEE Trans. Acoust. Speech Signal Process..

[8]  Chin-Hui Lee,et al.  A frame-synchronous network search algorithm for connected word recognition , 1989, IEEE Trans. Acoust. Speech Signal Process..

[9]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[10]  Calyampudi R. Rao,et al.  Linear Statistical Inference and Its Applications. , 1975 .

[11]  R. Moore,et al.  Explicit modelling of state occupancy in hidden Markov models for automatic speech recognition , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Stephen E. Levinson,et al.  Continuously variable duration hidden Markov models for speech analysis , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.