Maximum likelihood hidden Markov modeling using a dominant sequence of states

Approximate maximum likelihood (ML) hidden Markov modeling using the most likely state sequence (MLSS) is examined and compared with the exact ML approach that considers all possible state sequences. It is shown that for any hidden Markov model (HMM), the difference between the approximate and the exact normalized likelihood functions cannot exceed the logarithm of the number of states divided by the dimension of the output vectors (frame length). Furthermore, for Gaussian HMMs and a given observation sequence, the MLSS is typically the sequence of nearest neighbor states in the Itakura-Saito sense, and the posterior probability of any state sequence which departs from the MLSS in a single time instant, decays exponentially with the frame length. Hence, for a sufficiently large frame length the exact and approximate ML approach provide similar model estimates and likelihood values. >

[1]  R. Gray,et al.  Speech coding based upon vector quantization , 1980, ICASSP.

[2]  Biing-Hwang Juang,et al.  The segmental K-means algorithm for estimating parameters of hidden Markov models , 1990, IEEE Trans. Acoust. Speech Signal Process..

[3]  L. Baum,et al.  An inequality with applications to statistical estimation for probabilistic functions of Markov processes and to a model for ecology , 1967 .

[4]  Lawrence R. Rabiner,et al.  A segmental k-means training procedure for connected word recognition , 1986, AT&T Technical Journal.

[5]  R. McAulay Maximum likelihood spectral estimation and its application to narrow-band speech coding , 1984 .

[6]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[7]  Robert M. Gray,et al.  Toeplitz And Circulant Matrices , 1977 .

[8]  Biing-Hwang Juang,et al.  On the application of hidden Markov models for enhancing noisy speech , 1989, IEEE Trans. Acoust. Speech Signal Process..

[9]  L. Baum,et al.  An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[10]  A. Poritz,et al.  Hidden Markov models: a guided tour , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[11]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[12]  Lalit R. Bahl,et al.  A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[14]  A. Barron THE STRONG ERGODIC THEOREM FOR DENSITIES: GENERALIZED SHANNON-MCMILLAN-BREIMAN THEOREM' , 1985 .

[15]  L. Baum,et al.  Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .

[16]  Roberto Billi,et al.  Vector quantization and Markov source models applied to speech recognition , 1982, ICASSP.

[17]  N. Merhav,et al.  Hidden Markov modeling using a dominant state sequence with application to speech recognition , 1991 .

[18]  Robert M. Gray,et al.  Rate-distortion speech coding with a minimum discrimination information distortion measure , 1981, IEEE Trans. Inf. Theory.

[19]  Biing-Hwang Juang,et al.  Mixture autoregressive hidden Markov models for speech signals , 1985, IEEE Trans. Acoust. Speech Signal Process..

[20]  Lawrence R. Rabiner,et al.  On the relations between modeling approaches for speech recognition , 1990, IEEE Trans. Inf. Theory.

[21]  S. Qureshi,et al.  Adaptive equalization , 1982, Proceedings of the IEEE.

[22]  Y. Sato,et al.  A Method of Self-Recovering Equalization for Multilevel Amplitude-Modulation Systems , 1975, IEEE Trans. Commun..

[23]  New York Dover,et al.  ON THE CONVERGENCE PROPERTIES OF THE EM ALGORITHM , 1983 .