Closing the learning-planning loop with predictive state representations

A central problem in artificial intelligence is to choose actions to maximize reward in a partially observable, uncertain environment. To do so, we must learn an accurate environment model, and then plan to maximize reward. Unfortunately, learning algorithms often recover a model that is too inaccurate to support planning or too large and complex for planning to succeed; or they require excessive prior domain knowledge or fail to provide guarantees such as statistical consistency. To address this gap, we propose a novel algorithm which provably learns a compact, accurate model directly from sequences of action-observation pairs. We then evaluate the learner by closing the loop from observations to actions. In more detail, we present a spectral algorithm for learning a predictive state representation (PSR), and evaluate it in a simulated, vision-based mobile robot planning task, showing that the learned PSR captures the essential features of the environment and enables successful and efficient planning. Our algorithm has several benefits which have not appeared together in any previous PSR learner: it is computationally efficient and statistically consistent; it handles high-dimensional observations and long time horizons; and, our close-the-loop experiments provide an end-to-end practical test.

[1]  Bart De Moor,et al.  Subspace Identification for Linear Systems: Theory ― Implementation ― Applications , 2011 .

[2]  Byron Boots,et al.  Reduced-Rank Hidden Markov Models , 2009, AISTATS.

[3]  Herbert Jaeger,et al.  A Bound on Modeling Error in Observable Operator Models and an Associated Learning Algorithm , 2009, Neural Computation.

[4]  Sham M. Kakade,et al.  A spectral algorithm for learning Hidden Markov Models , 2008, J. Comput. Syst. Sci..

[5]  Russ Tedrake Learning to Fly like a Bird , 2009 .

[6]  Joelle Pineau,et al.  Model-Based Bayesian Reinforcement Learning in Large Structured Domains , 2008, UAI.

[7]  Satinder P. Singh,et al.  Efficiently learning linear-linear exponential family predictive representations of state , 2008, ICML '08.

[8]  Doina Precup,et al.  Point-Based Planning for Predictive State Representations , 2008, Canadian Conference on AI.

[9]  Satinder P. Singh,et al.  Exponential Family Predictive Representations of State , 2007, NIPS.

[10]  Satinder P. Singh,et al.  On discovery and learning of models with predictive representations of state for agents with continuous actions and observations , 2007, AAMAS '07.

[11]  Joelle Pineau,et al.  Anytime Point-Based Approximations for Large POMDPs , 2006, J. Artif. Intell. Res..

[12]  Nikos A. Vlassis,et al.  Improving Approximate Value Iteration Using Memories and Predictive State Representations , 2006, AAAI.

[13]  Michael H. Bowling,et al.  Learning predictive state representations using non-blind policies , 2006, ICML '06.

[14]  Michael H. Bowling,et al.  Online Discovery and Learning of Predictive State Representations , 2005, NIPS.

[15]  Guy Shani,et al.  Model-Based Online Learning of POMDPs , 2005, ECML.

[16]  Michael R. James,et al.  Learning predictive state representations in dynamical systems without reset , 2005, ICML.

[17]  Eric Wiewiora,et al.  Learning predictive representations from a history , 2005, ICML.

[18]  Yishay Mansour,et al.  Planning in POMDPs Using Multiplicity Automata , 2005, UAI.

[19]  Nikos A. Vlassis,et al.  Perseus: Randomized Point-based Value Iteration for POMDPs , 2005, J. Artif. Intell. Res..

[20]  Michael R. James,et al.  Predictive State Representations: A New Theory for Modeling Dynamical Systems , 2004, UAI.

[21]  Sebastian Thrun,et al.  Learning low dimensional predictive representations , 2004, ICML.

[22]  Michael R. James,et al.  Learning and discovery of predictive state representations in dynamical systems with reset , 2004, ICML.

[23]  Ben Tse,et al.  Autonomous Inverted Helicopter Flight via Reinforcement Learning , 2004, ISER.

[24]  Nicholas K. Jong and Peter Stone Towards Employing PSRs in a Continuous Domain , 2004 .

[25]  Peter Stone,et al.  Learning Predictive State Representations , 2003, ICML.

[26]  Joelle Pineau,et al.  Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.

[27]  Sebastiaan A. Terwijn,et al.  On the Learnability of Hidden Markov Models , 2002, ICGI.

[28]  Richard S. Sutton,et al.  Predictive Representations of State , 2001, NIPS.

[29]  Stefano Soatto,et al.  Dynamic Data Factorization , 2001 .

[30]  Herbert Jaeger,et al.  Observable Operator Models for Discrete Stochastic Time Series , 2000, Neural Computation.

[31]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[32]  Jeff A. Bilmes,et al.  A gentle tutorial of the em algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models , 1998 .

[33]  Andrew McCallum,et al.  Reinforcement learning with selective perception and hidden state , 1996 .

[34]  Leslie Pack Kaelbling,et al.  Acting Optimally in Partially Observable Stochastic Domains , 1994, AAAI.

[35]  Leslie G. Valiant,et al.  Cryptographic limitations on learning Boolean formulae and finite automata , 1994, JACM.

[36]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[37]  Edward J. Sondik,et al.  The optimal control of par-tially observable Markov processes , 1971 .