An Efficient, Expressive and Local Minima-Free Method for Learning Controlled Dynamical Systems

We propose a framework for modeling and estimating the state of controlled dynamical systems, where an agent can affect the system through actions and receives partial observations. Based on this framework, we propose the Predictive State Representation with Random Fourier Features (RFFPSR). A key property in RFF-PSRs is that the state estimate is represented by a conditional distribution of future observations given future actions. RFF-PSRs combine this representation with moment-matching, kernel embedding and local optimization to achieve a method that enjoys several favorable qualities: It can represent controlled environments which can be affected by actions; it has an efficient and theoretically justified learning algorithm; it uses a non-parametric representation that has expressive power to represent continuous non-linear dynamics. We provide a detailed formulation, a theoretical analysis and an experimental evaluation that demonstrates the effectiveness of our method.

[1]  Michael H. Bowling,et al.  Learning predictive state representations using non-blind policies , 2006, ICML '06.

[2]  Sham M. Kakade,et al.  A spectral algorithm for learning Hidden Markov Models , 2008, J. Comput. Syst. Sci..

[3]  Le Song,et al.  Hilbert Space Embeddings of Hidden Markov Models , 2010, ICML.

[4]  Carl E. Rasmussen,et al.  Bayesian Inference and Learning in Gaussian Process State-Space Models with Particle MCMC , 2013, NIPS.

[5]  Geoffrey J. Gordon,et al.  Supervised Learning for Dynamical System Learning , 2015, NIPS.

[6]  Le Song,et al.  Kernel Bayes' rule: Bayesian inference with positive definite kernels , 2013, J. Mach. Learn. Res..

[7]  Michael R. James,et al.  Predictive State Representations: A New Theory for Modeling Dynamical Systems , 2004, UAI.

[8]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[9]  Alexander J. Smola,et al.  Hilbert space embeddings of conditional distributions with applications to dynamical systems , 2009, ICML '09.

[10]  Byron Boots,et al.  An Online Spectral Learning Algorithm for Partially Observable Nonlinear Dynamical Systems , 2011, AAAI.

[11]  Sebastian Thrun,et al.  Learning low dimensional predictive representations , 2004, ICML.

[12]  Byron Boots,et al.  Learning to Filter with Predictive State Inference Machines , 2015, ICML.

[13]  Byron Boots,et al.  Closing the learning-planning loop with predictive state representations , 2009, Int. J. Robotics Res..

[14]  Bart De Moor,et al.  Subspace Identification for Linear Systems: Theory ― Implementation ― Applications , 2011 .

[15]  Byron Boots,et al.  Online Instrumental Variable Regression with Applications to Online Linear System Identification , 2016, AAAI.

[16]  Sham M. Kakade,et al.  A Linear Dynamical System Model for Text , 2015, ICML.

[17]  Byron Boots,et al.  Hilbert Space Embeddings of Predictive State Representations , 2013, UAI.

[18]  Anima Anandkumar,et al.  Tensor decompositions for learning latent variable models , 2012, J. Mach. Learn. Res..

[19]  Guangyu Xia,et al.  Expressive Collaborative Music Performance via Machine Learning , 2016 .

[20]  Paul J. Werbos,et al.  Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.

[21]  Yoshua Bengio,et al.  An Input Output HMM Architecture , 1994, NIPS.

[22]  John Langford,et al.  Learning nonlinear dynamic models , 2009, ICML '09.

[23]  Nathan Halko,et al.  Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..

[24]  Michael I. Jordan,et al.  Nonparametric Bayesian Learning of Switching Linear Dynamical Systems , 2008, NIPS.

[25]  Nan Jiang,et al.  Improving Predictive State Representations via Gradient Descent , 2016, AAAI.

[26]  Kamyar Azizzadenesheli,et al.  Reinforcement Learning of POMDPs using Spectral Methods , 2016, COLT.

[27]  Le Song,et al.  A Hilbert Space Embedding for Distributions , 2007, Discovery Science.

[28]  Byron Boots,et al.  Predictive State Temporal Difference Learning , 2010, NIPS.

[29]  Byron Boots Learning Dynamic Policies from Demonstration , 2013 .

[30]  Joel A. Tropp,et al.  An Introduction to Matrix Concentration Inequalities , 2015, Found. Trends Mach. Learn..

[31]  Byron Boots,et al.  Reduced-Rank Hidden Markov Models , 2009, AISTATS.