Improved estimation of hidden Markov model parameters from multiple observation sequences

The huge popularity of hidden Markov models (HMMs) in pattern recognition is due to the ability to "learn" model parameters from an observation sequence through Baum-Welch and other re-estimation procedures. In the case of HMM parameter estimation from an ensemble of observation sequences, rather than a single sequence, we require techniques for finding the parameters which maximize the likelihood of the estimated model given the entire set of observation sequences. The importance of this study is that HMMs with parameters estimated from multiple observations are shown to be many orders of magnitude more probable than HMM models learned from any single observation sequence - thus the effectiveness of HMM "learning" is greatly enhanced. In this paper we present techniques that usually find models significantly more likely than Rabiner's well-known method on both seen and unseen sequences.

[1]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[2]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[3]  E. S. Pearson,et al.  On the Problem of the Most Efficient Tests of Statistical Hypotheses , 1933 .

[4]  Monson H. Hayes,et al.  An embedded HMM-based approach for face detection and recognition , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[5]  Thad Starner,et al.  Visual Recognition of American Sign Language Using Hidden Markov Models. , 1995 .

[6]  Yangsheng Xu,et al.  Online, interactive learning of gestures for human/robot interfaces , 1996, Proceedings of IEEE International Conference on Robotics and Automation.

[7]  B. Silverman Density estimation for statistics and data analysis , 1986 .

[8]  Mosur Ravishankar,et al.  Efficient Algorithms for Speech Recognition. , 1996 .

[9]  L. R. Rabiner,et al.  An introduction to the application of the theory of probabilistic functions of a Markov process to automatic speech recognition , 1983, The Bell System Technical Journal.

[10]  E. S. Pearson,et al.  On the Problem of the Most Efficient Tests of Statistical Hypotheses , 1933 .

[11]  L. R. Rabiner,et al.  On the application of vector quantization and hidden Markov models to speaker-independent, isolated word recognition , 1983, The Bell System Technical Journal.