Reduced-Rank Hidden Markov Models

Hsu et al. (2009) recently proposed an efcient, accurate spectral learning algorithm for Hidden Markov Models (HMMs). In this paper we relax their assumptions and prove a tighter nite-sample error bound for the case of Reduced-Rank HMMs, i.e., HMMs with low-rank transition matrices. Since rank-k RR-HMMs are a larger class of models than k-state HMMs while being equally ecient to work with, this relaxation greatly increases the learning algorithm’s scope. In addition, we generalize the algorithm and bounds to models where multiple observations are needed to disambiguate state, and to models that emit multivariate real-valued observations. Finally we prove consistency for learning Predictive State Representations, an even larger class of models. Experiments on synthetic data and a toy video, as well as on dicult robot vision data, yield accurate models that compare favorably with alternatives in simulation quality and prediction accuracy.

[1]  Marcel Paul Schützenberger,et al.  On the Definition of a Family of Automata , 1961, Inf. Control..

[2]  K. Fu,et al.  On state estimation in switching environments , 1968 .

[3]  P. Wedin Perturbation bounds in connection with singular value decomposition , 1972 .

[4]  L. Baum,et al.  An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[5]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[6]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[7]  Colin McDiarmid,et al.  Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[8]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[9]  G. Stewart,et al.  Matrix Perturbation Theory , 1990 .

[10]  R. Shumway,et al.  Dynamic linear models with switching , 1991 .

[11]  V. Balasubramanian Equivalence and Reduction of Hidden Markov Models , 1993 .

[12]  Yoshua Bengio,et al.  An Input Output HMM Architecture , 1994, NIPS.

[13]  Geoffrey E. Hinton,et al.  Parameter estimation for linear dynamical systems , 1996 .

[14]  Andreas G. Andreou,et al.  Heteroscedastic discriminant analysis and reduced rank HMMs for improved speech recognition , 1998, Speech Commun..

[15]  G. W. Stewart,et al.  Matrix Algorithms: Volume 1, Basic Decompositions , 1998 .

[16]  Geoffrey E. Hinton,et al.  Variational Learning for Switching State-Space Models , 2000, Neural Computation.

[17]  Herbert Jaeger,et al.  Observable Operator Models for Discrete Stochastic Time Series , 2000, Neural Computation.

[18]  Jun S. Liu,et al.  Mixture Kalman filters , 2000 .

[19]  Richard S. Sutton,et al.  Predictive Representations of State , 2001, NIPS.

[20]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[21]  Michael O. Kolawole,et al.  Estimation and tracking , 2002 .

[22]  Satinder P. Singh,et al.  A Nonlinear Predictive State Representation , 2003, NIPS.

[23]  Peter Stone,et al.  Learning Predictive State Representations , 2003, ICML.

[24]  Michael R. James,et al.  Learning and discovery of predictive state representations in dynamical systems with reset , 2004, ICML.

[25]  Michael R. James,et al.  Predictive State Representations: A New Theory for Modeling Dynamical Systems , 2004, UAI.

[26]  Sebastian Thrun,et al.  Learning low dimensional predictive representations , 2004, ICML.

[27]  Patrik O. Hoyer,et al.  Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..

[28]  D. Popovici,et al.  Learning observable operator models via the ES algorithm 1 , 2005 .

[29]  David J. Fleet,et al.  Gaussian Process Dynamical Models , 2005, NIPS.

[30]  Michael R. James,et al.  Learning predictive state representations in dynamical systems without reset , 2005, ICML.

[31]  Elchanan Mossel,et al.  Learning nonsingular phylogenies and hidden Markov models , 2005, Symposium on the Theory of Computing.

[32]  Byron Boots,et al.  A Constraint Generation Approach to Learning Stable Linear Dynamical Systems , 2007, NIPS.

[33]  Satinder P. Singh,et al.  Exponential Family Predictive Representations of State , 2007, NIPS.

[34]  Eric Wiewiora,et al.  Modeling probability distributions with predictive state representations , 2007 .

[35]  Andrew W. Moore,et al.  Fast State Discovery for HMM Model Selection and Learning , 2007, AISTATS.

[36]  Herbert Jaeger,et al.  A Bound on Modeling Error in Observable Operator Models and an Associated Learning Algorithm , 2009, Neural Computation.

[37]  Sham M. Kakade,et al.  A spectral algorithm for learning Hidden Markov Models , 2008, J. Comput. Syst. Sci..

[38]  John Langford,et al.  Learning nonlinear dynamic models , 2009, ICML '09.

[39]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[40]  Bart De Moor,et al.  Subspace Identification for Linear Systems: Theory ― Implementation ― Applications , 2011 .

[41]  Byron Boots,et al.  Closing the learning-planning loop with predictive state representations , 2011, Int. J. Robotics Res..