Continuously variable duration hidden Markov models for speech analysis

During the past decade, the applicability of hidden Markov models (HMM) to various facets of speech analysis had been demonstrated in several different experiments. These investigations all rest on the assumption that speech is a quasi-stationary process whose stationary intervals can be identified with the occupancy of a single state of an appropriate HMM. In the traditional form of the HMM, the probability of duration of a state decreases exponentially with time. This behavior does not provide an adequate representation of the temporal structure of speech. The solution proposed here is to replace the probability distributions of duration with continuous probability density functions to form a continuously variable duration hidden Markov model (CVDHMM). The gamma distribution is ideally suited to specification of the durational density since it is one-sided and has only two parameters which, together, define both mean and variance. The main result is a derivation and proof of convergence of reestimation formulae for all the parameters of the CVDHMM. It is interesting to note that if the state durations are gamma distributed, one of the formulae is nonalgebraic but, fortuitously, has properties such that it is easily and rapidly solved numerically to any desired degree of accuracy. Other results are presented including the performance of the formulae on simulated data.

[1]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[2]  Roberto Billi,et al.  Vector quantization and Markov source models applied to speech recognition , 1982, ICASSP.

[3]  Stephen E. Levinson,et al.  On the use of hidden Markov models for speaker‐independent recognition of isolated words from a medium size vocabulary , 1983 .

[4]  R. Moore,et al.  Explicit modelling of state occupancy in hidden Markov models for automatic speech recognition , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  F. Jelinek,et al.  Continuous speech recognition by statistical methods , 1976, Proceedings of the IEEE.

[6]  L. R. Rabiner,et al.  Recognition of isolated digits using hidden Markov models with continuous mixture densities , 1985, AT&T Technical Journal.

[7]  J. Baker,et al.  The DRAGON system--An overview , 1975 .

[8]  Lalit R. Bahl,et al.  A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Stephen E. Levinson,et al.  A speaker-independent, syntax-directed, connected word recognition system based on hidden Markov models and level building , 1985, IEEE Trans. Acoust. Speech Signal Process..

[10]  L. Baum,et al.  An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[11]  Hermann Ney,et al.  Connected digit recognition using vector quantization , 1984, ICASSP.

[12]  L. R. Rabiner,et al.  An introduction to the application of the theory of probabilistic functions of a Markov process to automatic speech recognition , 1983, The Bell System Technical Journal.

[13]  S.E. Levinson,et al.  Structural methods in automatic speech recognition , 1985, Proceedings of the IEEE.

[14]  L. Baum,et al.  Growth transformations for functions on manifolds. , 1968 .

[15]  Irene A. Stegun,et al.  Handbook of Mathematical Functions. , 1966 .