MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES

Hidden Markov models (HMMs) have proven to be one of the most widely used tools for learning probabilistic models of time series data. In an HMM, information about the past is conveyed through a single discrete variable—the hidden state. We discuss a generalization of HMMs in which this state is factored into multiple state variables and is therefore represented in a distributed manner. We describe an exact algorithm for inferring the posterior probabilities of the hidden state variables given the observations, and relate it to the forward–backward algorithm for HMMs and to algorithms for more general graphical models. Due to the combinatorial nature of the hidden state representation, this exact algorithm is intractable. As in other intractable systems, approximate inference can be carried out using Gibbs sampling or variational methods. Within the variational framework, we present a structured approximation in which the the state variables are decoupled, yielding a tractable algorithm for learning the parameters of the model. Empirical comparisons suggest that these approximations are efficient and provide accurate alternatives to the exact methods. Finally, we use the structured approximation to model Bach's chorales and show that factorial HMMs can capture statistical structure in this data set which an unconstrained HMM cannot.

[1]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[2]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[3]  Donald B. Rubin,et al.  Max-imum Likelihood from Incomplete Data , 1972 .

[4]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[5]  I. Morgenstern,et al.  Magnetic correlations in two-dimensional spin-glasses , 1980 .

[6]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[8]  Geoffrey E. Hinton,et al.  Learning and relearning in Boltzmann machines , 1986 .

[9]  W. Wong,et al.  The calculation of posterior distributions by data augmentation , 1987 .

[10]  W. Wong,et al.  The calculation of posterior distributions by data augmentation , 1987 .

[11]  David J. Spiegelhalter,et al.  Local computations with probabilities on graphical structures and their application to expert systems , 1990 .

[12]  Stuart German,et al.  Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images , 1988 .

[13]  G. Parisi,et al.  Statistical Field Theory , 1988 .

[14]  P. McCullagh,et al.  Generalized Linear Models , 1992 .

[15]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[16]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[17]  Keiji Kanazawa,et al.  A model for reasoning about persistence and causation , 1989 .

[18]  Franz Josef Radermacher,et al.  Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference (Judea Pearl) , 1990, SIAM Rev..

[19]  Geoffrey E. Hinton,et al.  Mean field networks that learn to discriminate temporally distorted strings , 1991 .

[20]  Steven J. Nowlan,et al.  Soft competitive adaptation: neural network learning algorithms based on fitting statistical mixtures , 1991 .

[21]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[22]  David S. Touretzky,et al.  Connectionist models : proceedings of the 1990 summer school , 1991 .

[23]  Biing-Hwang Juang,et al.  Hidden Markov Models for Speech Recognition , 1991 .

[24]  Andreas Stolcke,et al.  Hidden Markov Model} Induction by Bayesian Model Merging , 1992, NIPS.

[25]  Radford M. Neal Connectionist Learning of Belief Networks , 1992, Artif. Intell..

[26]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[27]  A. P. Dawid,et al.  Applications of a general propagation algorithm for probabilistic expert systems , 1992 .

[28]  Geoffrey E. Hinton,et al.  Autoencoders, Minimum Description Length and Helmholtz Free Energy , 1993, NIPS.

[29]  Radford M. Neal A new view of the EM algorithm that justifies incremental and other variants , 1993 .

[30]  Steven J. Nowlan,et al.  Mixtures of Controllers for Jump Linear and Non-Linear Plants , 1993, NIPS.

[31]  A. Norman Redlich,et al.  Supervised Factorial Learning , 1993, Neural Computation.

[32]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[33]  Yoshua Bengio,et al.  An Input Output HMM Architecture , 1994, NIPS.

[34]  Michael I. Jordan,et al.  Boltzmann Chains and Hidden Markov Models , 1994, NIPS.

[35]  D. Haussler,et al.  Hidden Markov models in computational biology. Applications to protein modeling. , 1993, Journal of molecular biology.

[36]  M. A. McClure,et al.  Hidden Markov models of biological primary sequence information. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[37]  R. Zemel A minimum description length framework for unsupervised learning , 1994 .

[38]  Michael I. Jordan,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1994 .

[39]  Zoubin Ghahramani,et al.  Factorial Learning and the EM Algorithm , 1994, NIPS.

[40]  Stuart J. Russell,et al.  Stochastic simulation algorithms for dynamic probabilistic networks , 1995, UAI.

[41]  Geoffrey E. Hinton,et al.  The "wake-sleep" algorithm for unsupervised neural networks. , 1995, Science.

[42]  Michael I. Jordan,et al.  Learning Fine Motion by Markov Mixtures of Experts , 1995, NIPS.

[43]  Ian H. Witten,et al.  Multiple viewpoint systems for music prediction , 1995 .

[44]  Michael I. Jordan,et al.  Exploiting Tractable Substructures in Intractable Networks , 1995, NIPS.

[45]  Eric Saund,et al.  A Multiple Cause Mixture Model for Unsupervised Learning , 1995, Neural Computation.

[46]  Michael I. Jordan,et al.  Mean Field Theory for Sigmoid Belief Networks , 1996, J. Artif. Intell. Res..

[47]  S. Eddy Hidden Markov models. , 1996, Current opinion in structural biology.

[48]  Michael I. Jordan,et al.  Hidden Markov Decision Trees , 1996, NIPS.

[49]  Krzysztof J. Cios,et al.  Advances in neural information processing systems 7 , 1997 .

[50]  Michael I. Jordan,et al.  Probabilistic Independence Networks for Hidden Markov Probability Models , 1997, Neural Computation.

[51]  Jorma Rissanen,et al.  Stochastic Complexity in Statistical Inquiry , 1989, World Scientific Series in Computer Science.

[52]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[53]  Terrence J. Sejnowski,et al.  Unsupervised Learning , 2018, Encyclopedia of GIS.

[54]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.