A Bayesian approach for structural learning with hidden Markov models

Hidden Markov Models(HMM) have proved to be a successful modeling paradigm for dynamic and spatial processes in many domains, such as speech recognition, genomics, and general sequence alignment. Typically, in these applications, the model structures are predefined by domain experts. Therefore, the HMM learning problem focuses on the learning of the parameter values of the model to fit the given data sequences. However, when one considers other domains, such as, economics and physiology, model structure capturing the system dynamic behavior is not available. In order to successfully apply the HMM methodology in these domains, it is important that a mechanism is available for automatically deriving the model structure from the data. This paper presents a HMM learning procedure that simultaneously learns the model structure and the maximum likelihood parameter values of a HMM from data. The HMM model structures are derived based on the Bayesian model selection methodology. In addition, we introduce a new initialization procedure for HMM parameter value estimation based on the K-means clustering method. Experimental results with artificially generated data show the effectiveness of the approach.

[1]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[2]  Yoram Singer,et al.  Training Algorithms for Hidden Markov Models using Entropy Based Distance Functions , 1996, NIPS.

[3]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[4]  Andreas Stolcke,et al.  Best-first Model Merging for Hidden Markov Model Induction , 1994, ArXiv.

[5]  Mari Ostendorf,et al.  HMM topology design using maximum likelihood successive state splitting , 1997, Comput. Speech Lang..

[6]  Shigeki Sagayama,et al.  A successive state splitting algorithm for efficient allophone modeling , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  David Maxwell Chickering,et al.  Efficient Approximations for the Marginal Likelihood of Bayesian Networks with Hidden Variables , 1997, Machine Learning.

[8]  Anil K. Jain,et al.  39 Dimensionality and sample size considerations in pattern recognition practice , 1982, Classification, Pattern Recognition and Reduction of Dimensionality.

[9]  J. Rissanen A UNIVERSAL PRIOR FOR INTEGERS AND ESTIMATION BY MINIMUM DESCRIPTION LENGTH , 1983 .

[10]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[11]  Francisco Casacuberta,et al.  Learning the structure of HMM's through grammatical inference techniques , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[12]  Douglas H. Fisher,et al.  Supervised classification with temporal data , 1997 .

[13]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[14]  Gregory F. Cooper,et al.  A Bayesian Method for the Induction of Probabilistic Networks from Data , 1992 .

[15]  S. Chib Marginal Likelihood from the Gibbs Output , 1995 .

[16]  Matthew Brand,et al.  Structure Learning in Conditional Probability Models via an Entropic Prior and Parameter Extinction , 1999, Neural Computation.

[17]  Russell Greiner,et al.  Model Selection Criteria for Learning Belief Nets: An Empirical Comparison , 2000, ICML.

[18]  Peter C. Cheeseman,et al.  Bayesian Classification (AutoClass): Theory and Results , 1996, Advances in Knowledge Discovery and Data Mining.

[19]  Michael I. Jordan,et al.  Factorial Hidden Markov Models , 1995, Machine Learning.

[20]  G. Casella,et al.  Explaining the Gibbs Sampler , 1992 .

[21]  Alan K. Mackworth,et al.  Evaluation of gene-finding programs on mammalian sequences. , 2001, Genome research.

[22]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[23]  Sam Kwong,et al.  Training approach for hidden Markov models , 1996 .

[24]  Hagai Attias,et al.  A Variational Bayesian Framework for Graphical Models , 1999 .

[25]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[26]  Gautam Biswas,et al.  A Bayesian Approach to Temporal Data Clustering using Hidden Markov Models , 2000, ICML.

[27]  Matthew J. Beal,et al.  Graphical Models and Variational Methods , 2001 .

[28]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[29]  Lalit R. Bahl,et al.  Maximum mutual information estimation of hidden Markov model parameters for speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[30]  William J. Stewart,et al.  Introduction to the numerical solution of Markov Chains , 1994 .

[31]  Zoubin Ghahramani,et al.  Variational Inference for Bayesian Mixtures of Factor Analysers , 1999, NIPS.

[32]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .