Learning to Make Predictions In Partially Observable Environments Without a Generative Model

When faced with the problem of learning a model of a high-dimensional environment, a common approach is to limit the model to make only a restricted set of predictions, thereby simplifying the learning problem. These partial models may be directly useful for making decisions or may be combined together to form a more complete, structured model. However, in partially observable (non-Markov) environments, standard model-learning methods learn generative models, i.e. models that provide a probability distribution over all possible futures (such as POMDPs). It is not straightforward to restrict such models to make only certain predictions, and doing so does not always simplify the learning problem. In this paper we present prediction profile models: non-generative partial models for partially observable systems that make only a given set of predictions, and are therefore far simpler than generative models in some cases. We formalize the problem of learning a prediction profile model as a transformation of the original model-learning problem, and show empirically that one can learn prediction profile models that make a small set of important predictions even in systems that are too complex for standard generative models.

[1]  Satinder P. Singh,et al.  Predictive state representations with options , 2006, ICML.

[2]  Michael H. Bowling,et al.  Learning predictive state representations using non-blind policies , 2006, ICML '06.

[3]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.

[4]  Satinder P. Singh,et al.  Exponential Family Predictive Representations of State , 2007, NIPS.

[5]  Alicia P. Wolfe,et al.  Decision Tree Methods for Finding Reusable MDP Homomorphisms , 2006, AAAI.

[6]  Richard S. Sutton,et al.  Predictive Representations of State , 2001, NIPS.

[7]  Michael R. James,et al.  Predictive State Representations: A New Theory for Modeling Dynamical Systems , 2004, UAI.

[8]  George E. Monahan,et al.  A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms , 2007 .

[9]  G. Monahan State of the Art—A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms , 1982 .

[10]  Ronald L. Rivest,et al.  Diversity-based inference of finite automata , 1994, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[11]  Richard S. Sutton,et al.  Temporal-Difference Networks , 2004, NIPS.

[12]  Michael R. James,et al.  Approximate predictive state representations , 2008, AAMAS.

[13]  Erik Talvitie,et al.  Maintaining Predictions over Time without a Model , 2009, IJCAI.

[14]  Shlomo Zilberstein,et al.  Finite-memory control of partially observable systems , 1998 .

[15]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[16]  Jeffrey K. Uhlmann,et al.  New extension of the Kalman filter to nonlinear systems , 1997, Defense, Security, and Sensing.

[17]  Lex Weaver,et al.  The Optimal Reward Baseline for Gradient-Based Reinforcement Learning , 2001, UAI.

[18]  Craig Boutilier,et al.  Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..

[19]  Vishal Soni,et al.  Relational Knowledge with Predictive State Representations , 2007, IJCAI.

[20]  Charles Lee Isbell,et al.  Looping suffix tree-based inference of partially observable hidden state , 2006, ICML.

[21]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[22]  Byron Boots,et al.  Closing the learning-planning loop with predictive state representations , 2009, Int. J. Robotics Res..

[23]  Satinder P. Singh,et al.  Predictive Linear-Gaussian Models of Stochastic Dynamical Systems , 2005, UAI.

[24]  Peter L. Bartlett,et al.  Reinforcement Learning in POMDP's via Direct Gradient Ascent , 2000, ICML.

[25]  Byron Boots,et al.  An Online Spectral Learning Algorithm for Partially Observable Nonlinear Dynamical Systems , 2011, AAAI.

[26]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[27]  Andrew McCallum,et al.  Reinforcement learning with selective perception and hidden state , 1996 .

[28]  Erik Talvitie,et al.  Simple Local Models for Complex Dynamical Systems , 2008, NIPS.

[29]  Michael R. James,et al.  Learning predictive state representations in dynamical systems without reset , 2005, ICML.

[30]  Michael L. Littman,et al.  Algorithms for Sequential Decision Making , 1996 .

[31]  Andrew McCallum,et al.  Overcoming Incomplete Perception with Utile Distinction Memory , 1993, ICML.

[32]  Cosma Rohilla Shalizi,et al.  Blind Construction of Optimal Nonlinear Recursive Predictors for Discrete Sequences , 2004, UAI.

[33]  Satinder Singh Baveja,et al.  On predictive linear gaussian models , 2009 .

[34]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon: Discounted Costs , 1978, Oper. Res..

[35]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[36]  Vishal Soni,et al.  Abstraction in Predictive State Representations , 2007, AAAI.

[37]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[38]  Satinder P. Singh,et al.  Efficiently learning linear-linear exponential family predictive representations of state , 2008, ICML '08.

[39]  Alicia P. Wolfe,et al.  Paying attention to what matters: observation abstraction in partially observable environments , 2010 .

[40]  M. M. Hassan Mahmud,et al.  Constructing States for Reinforcement Learning , 2010, ICML.

[41]  Michael R. James,et al.  Learning and discovery of predictive state representations in dynamical systems with reset , 2004, ICML.

[42]  Monica Dinculescu,et al.  Approximate Predictive Representations of Partially Observable Systems , 2010, ICML.

[43]  Craig Boutilier,et al.  Bounded Finite State Controllers , 2003, NIPS.