Simple Local Models for Complex Dynamical Systems

We present a novel mathematical formalism for the idea of a "local model" of an uncontrolled dynamical system, a model that makes only certain predictions in only certain situations. As a result of its restricted responsibilities, a local model may be far simpler than a complete model of the system. We then show how one might combine several local models to produce a more detailed model. We demonstrate our ability to learn a collection of local models on a large-scale example and do a preliminary empirical comparison of learning a collection of local models and some other model learning methods.

[1]  R. E. Kalman,et al.  A New Approach to Linear Filtering and Prediction Problems , 2002 .

[2]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[3]  Richard Fikes,et al.  STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving , 1971, IJCAI.

[4]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[5]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon: Discounted Costs , 1978, Oper. Res..

[6]  G. Monahan State of the Art—A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms , 1982 .

[7]  Ronald L. Rivest,et al.  Diversity-based inference of finite automata , 1994, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[8]  Kim G. Larsen,et al.  Bisimulation through Probabilistic Testing , 1991, Inf. Comput..

[9]  Yolanda Gil,et al.  Learning by Experimentation: Incremental Refinement of Incomplete Planning Domains , 1994, International Conference on Machine Learning.

[10]  Xuemei Wang,et al.  Learning by Observation and Practice: An Incremental Approach for Planning Operator Acquisition , 1995, ICML.

[11]  Craig Boutilier,et al.  Context-Specific Independence in Bayesian Networks , 1996, UAI.

[12]  Andrew McCallum,et al.  Reinforcement learning with selective perception and hidden state , 1996 .

[13]  Michael L. Littman,et al.  Algorithms for Sequential Decision Making , 1996 .

[14]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[15]  Jeffrey K. Uhlmann,et al.  New extension of the Kalman filter to nonlinear systems , 1997, Defense, Security, and Sensing.

[16]  Xavier Boyen,et al.  Tractable Inference for Complex Stochastic Processes , 1998, UAI.

[17]  A. Cassandra,et al.  Exact and approximate algorithms for partially observable markov decision processes , 1998 .

[18]  Shlomo Zilberstein,et al.  Finite-memory control of partially observable systems , 1998 .

[19]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[20]  Craig Boutilier,et al.  Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..

[21]  Geoffrey E. Hinton Products of experts , 1999 .

[22]  Herbert Jaeger,et al.  Observable Operator Models for Discrete Stochastic Time Series , 2000, Neural Computation.

[23]  Peter L. Bartlett,et al.  Reinforcement Learning in POMDP's via Direct Gradient Ascent , 2000, ICML.

[24]  Richard S. Sutton,et al.  Predictive Representations of State , 2001, NIPS.

[25]  Ben Taskar,et al.  Learning Probabilistic Models of Relational Structure , 2001, ICML.

[26]  Lex Weaver,et al.  The Optimal Reward Baseline for Gradient-Based Reinforcement Learning , 2001, UAI.

[27]  Doina Precup,et al.  Learning Options in Reinforcement Learning , 2002, SARA.

[28]  Shie Mannor,et al.  Q-Cut - Dynamic Discovery of Sub-goals in Reinforcement Learning , 2002, ECML.

[29]  Robert Givan,et al.  Equivalence notions and model minimization in Markov decision processes , 2003, Artif. Intell..

[30]  Shobha Venkataraman,et al.  Efficient Solution Algorithms for Factored MDPs , 2003, J. Artif. Intell. Res..

[31]  Craig Boutilier,et al.  Bounded Finite State Controllers , 2003, NIPS.

[32]  Michael R. James,et al.  Learning and discovery of predictive state representations in dynamical systems with reset , 2004, ICML.

[33]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[34]  Michael R. James,et al.  Predictive State Representations: A New Theory for Modeling Dynamical Systems , 2004, UAI.

[35]  Michael I. Jordan,et al.  Factorial Hidden Markov Models , 1995, Machine Learning.

[36]  A. Barto,et al.  An algebraic approach to abstraction in reinforcement learning , 2004 .

[37]  Cosma Rohilla Shalizi,et al.  Blind Construction of Optimal Nonlinear Recursive Predictors for Discrete Sequences , 2004, UAI.

[38]  Matthew R. Rudary,et al.  Predictive Linear-Gaussian Models of Stochastic Dynamical Systems , 2005, UAI.

[39]  Eyal Amir,et al.  Learning Partially Observable Deterministic Action Models , 2005, IJCAI.

[40]  Michael R. James,et al.  Learning predictive state representations in dynamical systems without reset , 2005, ICML.

[41]  Charles Lee Isbell,et al.  Looping suffix tree-based inference of partially observable hidden state , 2006, ICML.

[42]  Olivier Sigaud,et al.  Learning the structure of Factored Markov Decision Processes in reinforcement learning problems , 2006, ICML.

[43]  Satinder P. Singh,et al.  Predictive state representations with options , 2006, ICML.

[44]  Michael H. Bowling,et al.  Learning predictive state representations using non-blind policies , 2006, ICML '06.

[45]  Alicia P. Wolfe,et al.  Decision Tree Methods for Finding Reusable MDP Homomorphisms , 2006, AAAI.

[46]  Thomas J. Walsh,et al.  Towards a Unified Theory of State Abstraction for MDPs , 2006, AI&M.

[47]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[48]  Michael L. Littman,et al.  Efficient Structure Learning in Factored-State MDPs , 2007, AAAI.

[49]  Vishal Soni,et al.  Abstraction in Predictive State Representations , 2007, AAAI.

[50]  Satinder P. Singh,et al.  Exponential Family Predictive Representations of State , 2007, NIPS.

[51]  George E. Monahan,et al.  A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms , 2007 .

[52]  Vishal Soni,et al.  Relational Knowledge with Predictive State Representations , 2007, IJCAI.

[53]  L. P. Kaelbling,et al.  Learning Symbolic Models of Stochastic Domains , 2007, J. Artif. Intell. Res..

[54]  Satinder P. Singh,et al.  Efficiently learning linear-linear exponential family predictive representations of state , 2008, ICML '08.

[55]  Doina Precup,et al.  Bounding Performance Loss in Approximate MDP Homomorphisms , 2008, NIPS.

[56]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.

[57]  B. Kuipers,et al.  From pixels to policies: A bootstrapping agent , 2008, 2008 7th IEEE International Conference on Development and Learning.

[58]  Erik Talvitie,et al.  Building Incomplete but Accurate Models , 2008, ISAIM.

[59]  Michael R. James,et al.  Approximate predictive state representations , 2008, AAMAS.

[60]  Britton D. Wolfe,et al.  Modeling Dynamical Systems with Structured Predictive State Representations , 2009 .

[61]  Satinder P. Singh,et al.  Transfer via soft homomorphisms , 2009, AAMAS.

[62]  Satinder Singh Baveja,et al.  On predictive linear gaussian models , 2009 .

[63]  Autonomously Learning an Action Hierarchy Using a Learned Qualitative State Representation , 2009, IJCAI.