Bayesian Learning of Recursively Factored Environments

Model-based reinforcement learning techniques have historically encountered a number of difficulties scaling up to large observation spaces. One promising approach has been to decompose the model learning task into a number of smaller, more manageable sub-problems by factoring the observation space. Typically, many different factorizations are possible, which can make it difficult to select an appropriate factorization without extensive testing. In this paper we introduce the class of recursively decomposable factorizations, and show how exact Bayesian inference can be used to efficiently guarantee predictive performance close to the best factorization in this class. We demonstrate the strength of this approach by presenting a collection of empirical results for 20 different Atari 2600 games.

[1]  Richard S. Sutton,et al.  Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.

[2]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[3]  F. Willems,et al.  Complexity reduction of the context-tree weighting algorithm : a study for KPN Research , 1997 .

[4]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[5]  Robert J. McEliece,et al.  The generalized distributive law , 2000, IEEE Trans. Inf. Theory.

[6]  Y. Shtarkov,et al.  The context-tree weighting method: basic properties , 1995, IEEE Trans. Inf. Theory.

[7]  Dr. Marcus Hutter,et al.  Universal artificial intelligence , 2004 .

[8]  Marcus Hutter,et al.  Universal Artificial Intelligence: Sequential Decisions Based on Algorithmic Probability (Texts in Theoretical Computer Science. An EATCS Series) , 2006 .

[9]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[10]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[11]  P. Grünwald The Minimum Description Length Principle (Adaptive Computation and Machine Learning) , 2007 .

[12]  Geoffrey E. Hinton,et al.  The Recurrent Temporal Restricted Boltzmann Machine , 2008, NIPS.

[13]  Pascal Poupart,et al.  Model-based Bayesian Reinforcement Learning in Partially Observable Domains , 2008, ISAIM.

[14]  Richard S. Sutton,et al.  Sample-based learning and search with permanent and transient memories , 2008, ICML '08.

[15]  Joelle Pineau,et al.  Model-Based Bayesian Reinforcement Learning in Large Structured Domains , 2008, UAI.

[16]  Finale Doshi-Velez,et al.  The Infinite Partially Observable Markov Decision Process , 2009, NIPS.

[17]  Lihong Li,et al.  The adaptive k-meteorologists problem and its application to structure learning and feature selection in reinforcement learning , 2009, ICML '09.

[18]  Csaba Szepesvári,et al.  Model-based and Model-free Reinforcement Learning for Visual Servoing , 2009, 2009 IEEE International Conference on Robotics and Automation.

[19]  Joel Veness,et al.  Reinforcement Learning via AIXI Approximation , 2010, AAAI.

[20]  Yavar Naddaf,et al.  Game-independent AI agents for playing Atari 2600 console games , 2010 .

[21]  Thomas J. Walsh,et al.  Integrating Sample-Based Planning and Model-Based Reinforcement Learning , 2010, AAAI.

[22]  Joel Veness,et al.  Monte-Carlo Planning in Large POMDPs , 2010, NIPS.

[23]  Joel Veness,et al.  A Monte-Carlo AIXI Approximation , 2009, J. Artif. Intell. Res..

[24]  Michael L. Littman,et al.  Learning is planning: near Bayes-optimal reinforcement learning via Monte-Carlo tree search , 2011, UAI.

[25]  Risto Miikkulainen,et al.  HyperNEAT-GGP: a hyperNEAT-based atari general game player , 2012, GECCO '12.

[26]  Marc G. Bellemare,et al.  Investigating Contingency Awareness Using Atari 2600 Games , 2012, AAAI.

[27]  Peter Dayan,et al.  Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search , 2012, NIPS.

[28]  Richard S. Sutton,et al.  Temporal-difference search in computer Go , 2012, Machine Learning.

[29]  Joel Veness,et al.  Sparse Sequential Dirichlet Coding , 2012, ArXiv.

[30]  Marcus Hutter,et al.  Context tree maximizing reinforcement learning , 2012, AAAI 2012.

[31]  Marc G. Bellemare,et al.  Sketch-Based Linear Value Function Approximation , 2012, NIPS.

[32]  Joel Veness,et al.  Context Tree Switching , 2011, 2012 Data Compression Conference.

[33]  Alborz Geramifard,et al.  Reinforcement learning with misspecified model classes , 2013, 2013 IEEE International Conference on Robotics and Automation.

[34]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..