Spectral Approaches to Learning Predictive Representations

A central problem in artificial intelligence is to choose actions to maximize reward in a partially observable, uncertain environment. To do so, we must obtain an accurate environment model, and then plan to maximize reward. However, for complex domains, specifying a model by hand can be a time consuming process. This motivates an alternative approach: learning a model directly from observations. Unfortunately, learning algorithms often recover a model that is too inaccurate to support planning or too large and complex for planning to succeed; or, they require excessive prior domain knowledge or fail to provide guarantees such as statistical consistency. To address this gap, we propose spectral subspace identification algorithms which provably learn compact, accurate, predictive models of partially observable dynamical systems directly from sequences of action-observation pairs. Our research agenda includes several variations of this general approach: spectral methods for classical models like Kalman filters and hidden Markov models, batch algorithms and online algorithms, and kernel-based algorithms for learning models in high- and infinite-dimensional feature spaces. All of these approaches share a common framework: the model's belief space is represented as predictions of observable quantities and spectral algorithms are applied to learn the model parameters. Unlike the popular EM algorithm, spectral learning algorithms are statistically consistent, computationally efficient, and easy to implement using established matrix-algebra techniques. We evaluate our learning algorithms on a series of prediction and planning tasks involving simulated data and real robotic systems.

[1]  H. Hotelling The most predictable criterion. , 1935 .

[2]  Marcel Paul Schützenberger,et al.  On the Definition of a Family of Automata , 1961, Inf. Control..

[3]  H. Rauch Solutions to the linear smoothing problem , 1963 .

[4]  Edward J. Sondik,et al.  The optimal control of par-tially observable Markov processes , 1971 .

[5]  C. Baker Joint measures and cross-covariance operators , 1973 .

[6]  Eamonn J. Keogh,et al.  UCR Time Series Data Mining Archive , 1983 .

[7]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[8]  Editors , 1986, Brain Research Bulletin.

[9]  Lennart Ljung,et al.  System Identification: Theory for the User , 1987 .

[10]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[11]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[12]  G. Stewart,et al.  Matrix Perturbation Theory , 1990 .

[13]  V. Balasubramanian Equivalence and Reduction of Hidden Markov Models , 1993 .

[14]  Leslie G. Valiant,et al.  Cryptographic Limitations on Learning Boolean Formulae and Finite Automata , 1993, Machine Learning: From Theory to Applications.

[15]  Leslie Pack Kaelbling,et al.  Acting Optimally in Partially Observable Stochastic Domains , 1994, AAAI.

[16]  Pierre Baldi,et al.  Smooth On-Line Learning Algorithms for Hidden Markov Models , 1994, Neural Computation.

[17]  Geoffrey E. Hinton,et al.  Parameter estimation for linear dynamical systems , 1996 .

[18]  Andrew McCallum,et al.  Reinforcement learning with selective perception and hidden state , 1996 .

[19]  Bill Triggs,et al.  Factorization methods for projective structure and motion , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[20]  Jan M. Maciejowski,et al.  Realization of stable models with subspace methods , 1996, Autom..

[21]  Jeff A. Bilmes,et al.  A gentle tutorial of the em algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models , 1998 .

[22]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[23]  G. Reinsel,et al.  Multivariate Reduced-Rank Regression: Theory and Applications , 1998 .

[24]  Daniel D. Morris,et al.  Factorization methods for structure from motion , 1998, Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[25]  John N. Tsitsiklis,et al.  Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives , 1999, IEEE Trans. Autom. Control..

[26]  Justin A. Boyan,et al.  Least-Squares Temporal Difference Learning , 1999, ICML.

[27]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[28]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[29]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[30]  Herbert Jaeger,et al.  Observable Operator Models for Discrete Stochastic Time Series , 2000, Neural Computation.

[31]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[32]  Johan A. K. Suykens,et al.  Identification of stable models in subspace identification by using regularization , 2001, IEEE Trans. Autom. Control..

[33]  Stefano Soatto,et al.  Dynamic Data Factorization , 2001 .

[34]  Richard S. Sutton,et al.  Predictive Representations of State , 2001, NIPS.

[35]  Sham M. Kakade,et al.  A Natural Policy Gradient , 2001, NIPS.

[36]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[37]  D. Bernstein,et al.  Subspace identification with guaranteed stability using constrained optimization , 2002, Proceedings of the 2002 American Control Conference (IEEE Cat. No.CH37301).

[38]  Stuart J. Russell,et al.  Dynamic bayesian networks: representation, inference and learning , 2002 .

[39]  Sebastian Thrun,et al.  FastSLAM: a factored solution to the simultaneous localization and mapping problem , 2002, AAAI/IAAI.

[40]  Craig Boutilier,et al.  Value-Directed Compression of POMDPs , 2002, NIPS.

[41]  Sebastiaan A. Terwijn,et al.  On the Learnability of Hidden Markov Models , 2002, ICGI.

[42]  Sanjiv Singh,et al.  Preliminary results in range-only localization and mapping , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[43]  Joelle Pineau,et al.  Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.

[44]  Ying Zhang,et al.  Localization from mere connectivity , 2003, MobiHoc '03.

[45]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[46]  Sanjiv Singh,et al.  Experimental results in range-only localization with radio , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[47]  Michail G. Lagoudakis,et al.  Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[48]  Peter Stone,et al.  Learning Predictive State Representations , 2003, ICML.

[49]  Kilian Q. Weinberger,et al.  Learning a kernel matrix for nonlinear dimensionality reduction , 2004, ICML.

[50]  Steven J. Bradtke,et al.  Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.

[51]  Michael R. James,et al.  Predictive State Representations: A New Theory for Modeling Dynamical Systems , 2004, UAI.

[52]  Bernhard Schölkopf,et al.  A kernel view of the dimensionality reduction of manifolds , 2004, ICML.

[53]  Sebastian Thrun,et al.  Learning low dimensional predictive representations , 2004, ICML.

[54]  Daniel B. Neill,et al.  National Retail Data Monitor for public health surveillance. , 2004, MMWR supplements.

[55]  Nicholas K. Jong and Peter Stone Towards Employing PSRs in a Continuous Domain , 2004 .

[56]  Takeo Kanade,et al.  Shape and motion from image streams under orthography: a factorization method , 1992, International Journal of Computer Vision.

[57]  Stefano Soatto,et al.  Dynamic Textures , 2003, International Journal of Computer Vision.

[58]  H. Jin Kim,et al.  Stable adaptive control with online learning , 2004, NIPS.

[59]  Nikos A. Vlassis,et al.  Perseus: Randomized Point-based Value Iteration for POMDPs , 2005, J. Artif. Intell. Res..

[60]  Peter I. Corke,et al.  Further Results with Localization and Mapping Using Range from Radio , 2005, FSR.

[61]  Sridhar Mahadevan,et al.  Samuel Meets Amarel: Automating Value Function Approximation Using Global State Space Analysis , 2005, AAAI.

[62]  Yishay Mansour,et al.  Planning in POMDPs Using Multiplicity Automata , 2005, UAI.

[63]  Sridhar Mahadevan,et al.  Representation Policy Iteration , 2005, UAI.

[64]  Michael H. Bowling,et al.  Action respecting embedding , 2005, ICML.

[65]  Tohru Katayama,et al.  Subspace Methods for System Identification , 2005 .

[66]  Wolfram Burgard,et al.  Probabilistic Robotics (Intelligent Robotics and Autonomous Agents) , 2005 .

[67]  Eric Wiewiora,et al.  Learning predictive representations from a history , 2005, ICML.

[68]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[69]  Michael R. James,et al.  Learning predictive state representations in dynamical systems without reset , 2005, ICML.

[70]  Guy Shani,et al.  Model-Based Online Learning of POMDPs , 2005, ECML.

[71]  Michael H. Bowling,et al.  Online Discovery and Learning of Predictive State Representations , 2005, NIPS.

[72]  David Choi,et al.  A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning , 2001, Discret. Event Dyn. Syst..

[73]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[74]  Nikos A. Vlassis,et al.  Improving Approximate Value Iteration Using Memories and Predictive State Representations , 2006, AAAI.

[75]  Athanasios Kehagias,et al.  Range-only SLAM with Interpolated Range Data , 2006 .

[76]  Michael H. Bowling,et al.  Learning predictive state representations using non-blind policies , 2006, ICML '06.

[77]  M. Brand,et al.  Fast low-rank modifications of the thin singular value decomposition , 2006 .

[78]  Joelle Pineau,et al.  Anytime Point-Based Approximations for Large POMDPs , 2006, J. Artif. Intell. Res..

[79]  Takehisa Yairi,et al.  Map building without localization by dimensionality reduction techniques , 2007, ICML '07.

[80]  Le Song,et al.  A Hilbert Space Embedding for Distributions , 2007, Discovery Science.

[81]  Byron Boots,et al.  A Constraint Generation Approach to Learning Stable Linear Dynamical Systems , 2007, NIPS.

[82]  Satinder P. Singh,et al.  Exponential Family Predictive Representations of State , 2007, NIPS.

[83]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[84]  Satinder P. Singh,et al.  On discovery and learning of models with predictive representations of state for agents with continuous actions and observations , 2007, AAMAS '07.

[85]  Byron Boots,et al.  Learning Stable Multivariate Baseline Models for Outbreak Detection , 2007 .

[86]  Chang Wang,et al.  Compact Spectral Bases for Value Function Approximation Using Kronecker Factorization , 2007, AAAI.

[87]  Neil D. Lawrence,et al.  WiFi-SLAM Using Gaussian Process Latent Variable Models , 2007, IJCAI.

[88]  Rui Li,et al.  Simultaneous Learning of Nonlinear Manifold and Dynamical Models for High-dimensional Time Series , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[89]  Satinder P. Singh,et al.  Efficiently learning linear-linear exponential family predictive representations of state , 2008, ICML '08.

[90]  Lihong Li,et al.  An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning , 2008, ICML '08.

[91]  Joelle Pineau,et al.  Model-Based Bayesian Reinforcement Learning in Large Structured Domains , 2008, UAI.

[92]  Bernhard Schölkopf,et al.  Injective Hilbert Space Embeddings of Probability Measures , 2008, COLT.

[93]  Doina Precup,et al.  Point-Based Planning for Predictive State Representations , 2008, Canadian Conference on AI.

[94]  Sanjiv Singh,et al.  A Robust Method of Localization and Mapping Using Only Range , 2008, ISER.

[95]  Herbert Jaeger,et al.  A Bound on Modeling Error in Observable Operator Models and an Associated Learning Algorithm , 2009, Neural Computation.

[96]  Sham M. Kakade,et al.  A spectral algorithm for learning Hidden Markov Models , 2008, J. Comput. Syst. Sci..

[97]  Alexander J. Smola,et al.  Hilbert space embeddings of conditional distributions with applications to dynamical systems , 2009, ICML '09.

[98]  Andrew Y. Ng,et al.  Regularization and feature selection in least-squares temporal difference learning , 2009, ICML '09.

[99]  Russ Tedrake Learning to Fly like a Bird , 2009 .

[100]  Joseph A. Djugash,et al.  Geolocation with Range: Robustness, Efficiency and Scalability , 2010 .

[101]  Sridhar Mahadevan,et al.  Compressing POMDPs Using Locality Preserving Non-Negative Matrix Factorization , 2010, AAAI.

[102]  Byron Boots,et al.  Closing the learning-planning loop with predictive state representations , 2009, Int. J. Robotics Res..

[103]  Byron Boots,et al.  Reduced-Rank Hidden Markov Models , 2009, AISTATS.

[104]  Le Song,et al.  Hilbert Space Embeddings of Hidden Markov Models , 2010, ICML.

[105]  Amaury Habrard,et al.  A Spectral Approach for Probabilistic Grammatical Inference on Trees , 2010, ALT.

[106]  Geoffrey J. Gordon,et al.  Automatic state discovery for unstructured audio scene classification , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[107]  Byron Boots,et al.  Predictive State Temporal Difference Learning , 2010, NIPS.

[108]  Emmanuel J. Candès,et al.  Matrix Completion With Noise , 2009, Proceedings of the IEEE.

[109]  Byron Boots,et al.  Closing the learning-planning loop with predictive state representations , 2009, Int. J. Robotics Res..

[110]  Le Song,et al.  Nonparametric Tree Graphical Models , 2010, AISTATS.

[111]  Dieter Fox,et al.  Learning GP-BayesFilters via Gaussian process latent variable models , 2009, Auton. Robots.

[112]  Le Song,et al.  A Spectral Algorithm for Latent Tree Graphical Models , 2011, ICML.

[113]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[114]  Bart De Moor,et al.  Subspace Identification for Linear Systems: Theory ― Implementation ― Applications , 2011 .

[115]  Le Song,et al.  Spectral Methods for Learning Multivariate Latent Tree Structure , 2011, NIPS.

[116]  Le Song,et al.  Kernel Bayes' Rule , 2010, NIPS.

[117]  Neil D. Lawrence,et al.  Spectral Dimensionality Reduction via Maximum Entropy , 2011, AISTATS.

[118]  Michael I. Jordan,et al.  Bayesian Nonparametric Inference of Switching Dynamic Linear Models , 2010, IEEE Transactions on Signal Processing.

[119]  Erik Talvitie,et al.  Learning to Make Predictions In Partially Observable Environments Without a Generative Model , 2011, J. Artif. Intell. Res..

[120]  Raphaël Bailly QWA: Spectral Algorithm , 2011, ACML.

[121]  Le Song,et al.  Kernel Embeddings of Latent Tree Graphical Models , 2011, NIPS.

[122]  Byron Boots,et al.  An Online Spectral Learning Algorithm for Partially Observable Nonlinear Dynamical Systems , 2011, AAAI.

[123]  Alessandro Perina,et al.  A regularized spectral algorithm for Hidden Markov Models with applications in computer vision , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[124]  Karl Stratos,et al.  Spectral Learning of Latent-Variable PCFGs , 2012, ACL.

[125]  Michael Collins,et al.  Spectral Dependency Parsing with Latent Variables , 2012, EMNLP-CoNLL.

[126]  Ariadna Quattoni,et al.  Local Loss Optimization in Operator Models: A New Insight into Spectral Learning , 2012, ICML.

[127]  Dean P. Foster,et al.  Spectral dimensionality reduction for HMMs , 2012, ArXiv.

[128]  Guy Lever,et al.  Modelling transition dynamics in MDPs with RKHS embeddings , 2012, ICML.

[129]  Anima Anandkumar,et al.  Two SVDs Suffice: Spectral decompositions for probabilistic topic modeling and latent Dirichlet allocation , 2012, NIPS 2012.

[130]  Kenji Fukumizu,et al.  Hilbert Space Embeddings of POMDPs , 2012, UAI.

[131]  Byron Boots,et al.  Two Manifold Problems with Applications to Nonlinear System Identification , 2012, ICML.

[132]  Byron Boots,et al.  A Spectral Learning Approach to Range-Only SLAM , 2012, ICML.