Learning latent variable and predictive models of dynamical systems

A variety of learning problems in robotics, computer vision and other areas of artificial intelligence can be construed as problems of learning statistical models for dynamical systems from sequential observations. Good dynamical system models allow us to represent and predict observations in these systems, which in turn enables applications such as classification, planning, control, simulation, anomaly detection and forecasting. One class of dynamical system models assumes the existence of an underlying hidden random variable that evolves over time and emits the observations we see. Past observations are summarized into the belief distribution over this random variable, which represents the state of the system. This assumption leads to ‘latent variable models’ which are used heavily in practice. However, learning algorithms for these models still face a variety of issues such as model selection, local optima and instability. The representational ability of these models also differs significantly based on whether the underlying latent variable is assumed to be discrete as in Hidden Markov Models (HMMs), or real-valued as in Linear Dynamical Systems (LDSs). Another recently introduced class of models represents state as a set of predictions about future observations rather than as a latent variable summarizing the past. These ‘predictive models’, such as Predictive State Representations (PSRs), are provably more powerful than latent variable models and hold the promise of allowing more accurate, efficient learning algorithms since no hidden quantities are involved. However, this promise has not been realized. In this thesis we propose novel learning algorithms that address the issues of model selection, local minima and instability in learning latent variable models. We show that certain ‘predictive’ latent variable model learning methods bridge the gap between latent variable and predictive models. We also propose a novel latent variable model, the Reduced-Rank HMM (RR-HMM), that combines desirable properties of discrete and real-valued latent-variable models. We show that reparameterizing the class of RR-HMMs yields a subset of PSRs, and propose an asymptotically unbiased predictive learning algorithm for RR-HMMs and PSRs along with finite-sample error bounds for the RR-HMM case. In terms of efficiency and accuracy, our methods outperform alternatives on dynamic texture videos, mobile robot visual sensing data, and other domains.

[1]  Michael I. Jordan,et al.  An HDP-HMM for systems with state persistence , 2008, ICML '08.

[2]  Andrew W. Moore,et al.  Fast inference and learning in large-state-space HMMs , 2005, ICML '05.

[3]  Geoffrey J. Gordon,et al.  Finding Approximate POMDP solutions Through Belief Compression , 2011, J. Artif. Intell. Res..

[4]  Gautam Biswas,et al.  Temporal Pattern Generation Using Hidden Markov Model Based Unsupervised Classification , 1999, IDA.

[5]  Sham M. Kakade,et al.  A spectral algorithm for learning Hidden Markov Models , 2008, J. Comput. Syst. Sci..

[6]  David J. Fleet,et al.  Gaussian Process Dynamical Models , 2005, NIPS.

[7]  V. Balasubramanian Equivalence and Reduction of Hidden Markov Models , 1993 .

[8]  Dennis S. Bernstein,et al.  Subspace identification with guaranteed stability using constrained optimization , 2003, IEEE Trans. Autom. Control..

[9]  Thad Starner,et al.  Visual Recognition of American Sign Language Using Hidden Markov Models. , 1995 .

[10]  Elchanan Mossel,et al.  Learning nonsingular phylogenies and hidden Markov models , 2005, STOC '05.

[12]  Jan M. Maciejowski,et al.  Realization of stable models with subspace methods , 1996, Autom..

[13]  H.G. Okuno,et al.  Computational Auditory Scene Analysis and Its Application to Robot Audition: Five Years Experience , 2007, Second International Conference on Informatics Research for Development of Knowledge Society Infrastructure (ICKS'07).

[14]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[15]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[16]  Sam T. Roweis,et al.  Constrained Hidden Markov Models , 1999, NIPS.

[17]  Satinder P. Singh,et al.  Mixtures of Predictive Linear Gaussian Models for Nonlinear, Stochastic Dynamical Systems , 2006, AAAI.

[18]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[19]  David Heckerman,et al.  Asymptotic Model Selection for Directed Networks with Hidden Variables , 1996, UAI.

[20]  Stuart J. Russell,et al.  Dynamic bayesian networks: representation, inference and learning , 2002 .

[21]  Marcel Paul Schützenberger,et al.  On the Definition of a Family of Automata , 1961, Inf. Control..

[22]  Lalit R. Bahl,et al.  A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Jun S. Liu,et al.  Mixture Kalman filters , 2000 .

[24]  Matthew Brand,et al.  Structure Learning in Conditional Probability Models via an Entropic Prior and Parameter Extinction , 1999, Neural Computation.

[25]  P. Boufounos,et al.  HIDDEN MARKOV MODELS FOR DNA SEQUENCING , 2002 .

[26]  Geoffrey E. Hinton,et al.  Variational Learning for Switching State-Space Models , 2000, Neural Computation.

[27]  Alexander J. Smola,et al.  Hilbert space embeddings of conditional distributions with applications to dynamical systems , 2009, ICML '09.

[28]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[29]  H. Jin Kim,et al.  Stable adaptive control with online learning , 2004, NIPS.

[30]  R. Shumway,et al.  Dynamic linear models with switching , 1991 .

[31]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[32]  Herbert Jaeger,et al.  A Bound on Modeling Error in Observable Operator Models and an Associated Learning Algorithm , 2009, Neural Computation.

[33]  Joelle Pineau,et al.  Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.

[34]  Eric Wiewiora,et al.  Modeling probability distributions with predictive state representations , 2007 .

[35]  P. Pardalos,et al.  Handbook of global optimization , 1995 .

[36]  Carl E. Rasmussen,et al.  Factorial Hidden Markov Models , 1997 .

[37]  Andrew W. Moore,et al.  Fast State Discovery for HMM Model Selection and Learning , 2007, AISTATS.

[38]  Peter Stone,et al.  Learning Predictive State Representations , 2003, ICML.

[39]  Yoshua Bengio,et al.  An Input Output HMM Architecture , 1994, NIPS.

[40]  Michael I. Jordan,et al.  Factorial Hidden Markov Models , 1995, Machine Learning.

[41]  P. Wedin Perturbation bounds in connection with singular value decomposition , 1972 .

[42]  Marijn Huijbregts,et al.  The ICSI RT07s Speaker Diarization System , 2007, CLEAR.

[43]  B. Silverman Density estimation for statistics and data analysis , 1986 .

[44]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[45]  Nikos Fakotakis,et al.  On acoustic surveillance of hazardous situations , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[46]  Satinder P. Singh,et al.  Kernel Predictive Linear Gaussian models for nonlinear stochastic dynamical systems , 2006, ICML.

[47]  Andreas G. Andreou,et al.  Heteroscedastic discriminant analysis and reduced rank HMMs for improved speech recognition , 1998, Speech Commun..

[48]  Michael R. James,et al.  Learning and discovery of predictive state representations in dynamical systems with reset , 2004, ICML.

[49]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[50]  João Paulo da Silva Neto,et al.  Non-speech audio event detection , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[51]  Byron Boots,et al.  A Constraint Generation Approach to Learning Stable Linear Dynamical Systems , 2007, NIPS.

[52]  Zoubin Ghahramani,et al.  Learning Nonlinear Dynamical Systems Using an EM Algorithm , 1998, NIPS.

[53]  Michael O. Kolawole,et al.  Estimation and tracking , 2002 .

[54]  Joseph Gonzalez,et al.  Residual Splash for Optimally Parallelizing Belief Propagation , 2009, AISTATS.

[55]  D. Haussler,et al.  A hidden Markov model that finds genes in E. coli DNA. , 1994, Nucleic acids research.

[56]  Nikos A. Vlassis,et al.  Perseus: Randomized Point-based Value Iteration for POMDPs , 2005, J. Artif. Intell. Res..

[57]  Richard S. Sutton,et al.  Predictive Representations of State , 2001, NIPS.

[58]  Michael R. James,et al.  Predictive State Representations: A New Theory for Modeling Dynamical Systems , 2004, UAI.

[59]  R. Shumway,et al.  AN APPROACH TO TIME SERIES SMOOTHING AND FORECASTING USING THE EM ALGORITHM , 1982 .

[60]  Rama Chellappa,et al.  A hidden Markov model based framework for recognition of humans from gait sequences , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[61]  Lianhong Cai,et al.  Cultural style based music classification of audio signals , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[62]  Michael R. James,et al.  Learning predictive state representations in dynamical systems without reset , 2005, ICML.

[63]  H. Rauch Solutions to the linear smoothing problem , 1963 .

[64]  Jon M. Kleinberg,et al.  Fast Algorithms for Large-State-Space HMMs with Applications to Web Usage Analysis , 2003, NIPS.

[65]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[66]  Colin McDiarmid,et al.  Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[67]  Satinder P. Singh,et al.  A Nonlinear Predictive State Representation , 2003, NIPS.

[68]  J. Shaffer Multiple Hypothesis Testing , 1995 .

[69]  Richard A. Davis,et al.  Time Series: Theory and Methods , 2013 .

[70]  Andreas Stolcke,et al.  Best-first Model Merging for Hidden Markov Model Induction , 1994, ArXiv.

[71]  K. Ito,et al.  On State Estimation in Switching Environments , 1970 .

[72]  Yoram Singer,et al.  The Hierarchical Hidden Markov Model: Analysis and Applications , 1998, Machine Learning.

[73]  Alex Pentland,et al.  Coupled hidden Markov models for complex action recognition , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[74]  Bart De Moor,et al.  Subspace Identification for Linear Systems: Theory ― Implementation ― Applications , 2011 .

[75]  Sanjoy Dasgupta,et al.  A Generalization of Principal Components Analysis to the Exponential Family , 2001, NIPS.

[76]  Kevin P. Murphy,et al.  Linear-time inference in Hierarchical HMMs , 2001, NIPS.

[77]  Satinder P. Singh,et al.  Exponential Family Predictive Representations of State , 2007, NIPS.

[78]  R. Kopp,et al.  LINEAR REGRESSION APPLIED TO SYSTEM IDENTIFICATION FOR ADAPTIVE CONTROL SYSTEMS , 1963 .

[79]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[80]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[81]  Stefano Soatto,et al.  Dynamic Textures , 2003, International Journal of Computer Vision.

[82]  Roni Rosenfeld,et al.  Learning Hidden Markov Model Structure for Information Extraction , 1999 .

[83]  Daniel B. Neill,et al.  National Retail Data Monitor for public health surveillance. , 2004, MMWR supplements.

[84]  Johan A. K. Suykens,et al.  Identification of stable models in subspace identification by using regularization , 2001, IEEE Trans. Autom. Control..

[85]  John Langford,et al.  Learning nonlinear dynamic models , 2009, ICML '09.

[86]  Henry Cox,et al.  On the estimation of state variables and parameters for noisy dynamic systems , 1964 .

[87]  L. Baum,et al.  An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[88]  Patrik O. Hoyer,et al.  Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..

[89]  Geoffrey E. Hinton,et al.  Parameter estimation for linear dynamical systems , 1996 .

[90]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[91]  Herbert Jaeger,et al.  Observable Operator Models for Discrete Stochastic Time Series , 2000, Neural Computation.

[92]  Mohammed Waleed Kadous,et al.  Temporal classification: extending the classification paradigm to multivariate time series , 2002 .

[93]  Matthew J. Beal Variational algorithms for approximate Bayesian inference , 2003 .

[94]  Eamonn J. Keogh,et al.  UCR Time Series Data Mining Archive , 1983 .

[95]  Sebastian Thrun,et al.  Learning low dimensional predictive representations , 2004, ICML.

[96]  Zoubin Ghahramani,et al.  An Introduction to Hidden Markov Models and Bayesian Networks , 2001, Int. J. Pattern Recognit. Artif. Intell..

[97]  Doina Precup,et al.  Point-Based Planning for Predictive State Representations , 2008, Canadian Conference on AI.

[98]  Zoubin Ghahramani,et al.  Optimization with EM and Expectation-Conjugate-Gradient , 2003, ICML.

[99]  G. Stewart,et al.  Matrix Perturbation Theory , 1990 .

[100]  Lennart Ljung,et al.  System Identification: Theory for the User , 1987 .

[101]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[102]  Hermann Ney,et al.  Audio segmentation for speech recognition using segment features , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[103]  Byron Boots,et al.  Closing the learning-planning loop with predictive state representations , 2011, Int. J. Robotics Res..

[104]  Zoubin Ghahramani,et al.  Learning Dynamic Bayesian Networks , 1997, Summer School on Neural Networks.

[105]  Nir Friedman,et al.  Learning the Dimensionality of Hidden Variables , 2001, UAI.

[106]  Terrence J. Sejnowski,et al.  Variational Learning for Switching State-Space Models , 2001 .

[107]  Mari Ostendorf,et al.  HMM topology design using maximum likelihood successive state splitting , 1997, Comput. Speech Lang..

[108]  James M. Rehg,et al.  A data-driven approach to quantifying natural human motion , 2005, ACM Trans. Graph..

[109]  Satinder P. Singh,et al.  Predictive Linear-Gaussian Models of Stochastic Dynamical Systems , 2005, UAI.