Variational Learning for Switching State-Space Models

We introduce a new statistical model for time series that iteratively segments data into regimes with approximately linear dynamics and learns the parameters of each of these linear regimes. This model combines and generalizes two of the most widely used stochastic time-series modelshidden Markov models and linear dynamical systemsand is closely related to models that are widely used in the control and econometrics literatures. It can also be derived by extending the mixture of experts neural network (Jacobs, Jordan, Nowlan, & Hinton, 1991) to its fully dynamical version, in which both expert and gating networks are recurrent. Inferring the posterior probabilities of the hidden states of this model is computationally intractable, and therefore the exact expectation maximization (EM) algorithm cannot be applied. However, we present a variational approximation that maximizes a lower bound on the log-likelihood and makes use of both the forward and backward recursions for hidden Markov models and the Kalman filter recursions for linear dynamical systems. We tested the algorithm on artificial data sets and a natural data set of respiration force from a patient with sleep apnea. The results suggest that variational approximations are a viable method for inference and learning in switching state-space models.

[1]  R. E. Kalman,et al.  New Results in Linear Filtering and Prediction Theory , 1961 .

[2]  H. Rauch Solutions to the linear smoothing problem , 1963 .

[3]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[4]  K. Ito,et al.  On State Estimation in Switching Environments , 1970 .

[5]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[6]  M. Athans,et al.  State Estimation for Discrete Systems with Switching Parameters , 1978, IEEE Transactions on Aerospace and Electronic Systems.

[7]  R. Shumway,et al.  AN APPROACH TO TIME SERIES SMOOTHING AND FORECASTING USING THE EM ALGORITHM , 1982 .

[8]  B. Anderson,et al.  Optimal Filtering , 1979, IEEE Transactions on Systems, Man, and Cybernetics.

[9]  Lennart Ljung,et al.  Theory and Practice of Recursive Identification , 1983 .

[10]  Graham C. Goodwin,et al.  Adaptive filtering prediction and control , 1984 .

[11]  Jonathan D. Cryer,et al.  Time Series Analysis , 1986 .

[12]  L. Rabiner,et al.  An introduction to hidden Markov models , 1986, IEEE ASSP Magazine.

[13]  David J. Spiegelhalter,et al.  Local computations with probabilities on graphical structures and their application to expert systems , 1990 .

[14]  G. Parisi,et al.  Statistical Field Theory , 1988 .

[15]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[16]  Keiji Kanazawa,et al.  A model for reasoning about persistence and causation , 1989 .

[17]  James D. Hamilton A New Approach to the Economic Analysis of Nonstationary Time Series and the Business Cycle , 1989 .

[18]  R. Shumway,et al.  Dynamic linear models with switching , 1991 .

[19]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[20]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[21]  Biing-Hwang Juang,et al.  Hidden Markov Models for Speech Recognition , 1991 .

[22]  Padhraic J. Smyth,et al.  Hidden Markov models for fault detection in dynamic systems , 1993 .

[23]  J. R. Rohlicek,et al.  ML estimation of a stochastic linear system with the EM algorithm and its application to speech recognition , 1993, IEEE Trans. Speech Audio Process..

[24]  Steven J. Nowlan,et al.  Mixtures of Controllers for Jump Linear and Non-Linear Plants , 1993, NIPS.

[25]  Li Deng,et al.  A stochastic model of speech incorporating hierarchical nonstationarity , 1993, IEEE Trans. Speech Audio Process..

[26]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[27]  Yoshua Bengio,et al.  An Input Output HMM Architecture , 1994, NIPS.

[28]  M. A. McClure,et al.  Hidden Markov models of biological primary sequence information. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[29]  Andreas S. Weigend,et al.  Time Series Prediction: Forecasting the Future and Understanding the Past , 1994 .

[30]  John B. Moore,et al.  Hidden Markov Models: Estimation and Control , 1994 .

[31]  R. Kohn,et al.  On Gibbs sampling for state space models , 1994 .

[32]  Naonori Ueda,et al.  Deterministic Annealing Variant of the EM Algorithm , 1994, NIPS.

[33]  Chang‐Jin Kim,et al.  Dynamic linear models with Markov-switching , 1994 .

[34]  Stuart J. Russell,et al.  Stochastic simulation algorithms for dynamic probabilistic networks , 1995, UAI.

[35]  Michael I. Jordan,et al.  Learning Fine Motion by Markov Mixtures of Experts , 1995, NIPS.

[36]  Visakan Kadirkamanathan,et al.  Recursive Estimation of Dynamic Modular RBF Networks , 1995, NIPS.

[37]  Michael I. Jordan,et al.  Exploiting Tractable Substructures in Intractable Networks , 1995, NIPS.

[38]  Klaus-Robert Müller,et al.  Annealed Competition of Experts for a Segmentation and Classification of Switching Dynamics , 1996, Neural Computation.

[39]  Geoffrey E. Hinton,et al.  The EM algorithm for mixtures of factor analyzers , 1996 .

[40]  Geoffrey E. Hinton,et al.  Parameter estimation for linear dynamical systems , 1996 .

[41]  Geoffrey E. Hinton,et al.  Modeling the manifolds of images of handwritten digits , 1997, IEEE Trans. Neural Networks.

[42]  Athanasios Kehagias,et al.  Time-Series Segmentation Using Predictive Modular Neural Networks , 1997, Neural Computation.

[43]  Joydeep Ghosh,et al.  A mixture-of-experts framework for adaptive Kalman filtering , 1997, IEEE Trans. Syst. Man Cybern. Part B.

[44]  Michael I. Jordan,et al.  Probabilistic Independence Networks for Hidden Markov Probability Models , 1997, Neural Computation.

[45]  Michael I. Jordan Learning in Graphical Models , 1999, NATO ASI Series.

[46]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[47]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine-mediated learning.

[48]  Zoubin Ghahramani,et al.  A Unifying Review of Linear Gaussian Models , 1999, Neural Computation.

[49]  Michael O. Kolawole,et al.  Estimation and tracking , 2002 .