Learning to Plan from Raw Data in Grid-based Games

An agent that autonomously learns to act in its environment must acquire a model of the domain dynamics. This can be a challenging task, especially in real-world domains, where observations are high-dimensional and noisy. Although in automated planning the dynamics are typically given, there are action schema learning approaches that learn symbolic rules (e.g. STRIPS or PDDL) to be used by traditional planners. However, these algorithms rely on logical descriptions of environment observations. In contrast, recent methods in deep reinforcement learning for games learn from pixel observations. However, they typically do not acquire an environment model, but a policy for one-step action selection. Even when a model is learned, it cannot generalize to unseen instances of the training domain. Here we propose a neural network-based method that learns from visual observations an approximate, compact, implicit representation of the domain dynamics, which can be used for planning with standard search algorithms, and generalizes to novel domain instances. The learned model is composed of submodules, each implicitly representing an action schema in the traditional sense. We evaluate our approach on visual versions of the standard domain Sokoban, and show that, by training on one single instance, it learns a transition model that can be successfully used to solve new levels of the game.

[1]  Xuemei Wang,et al.  Learning by Observation and Practice: An Incremental Approach for Planning Operator Acquisition , 1995, ICML.

[2]  Qiang Yang,et al.  Learning complex action models with quantifiers and logical implications , 2010, Artif. Intell..

[3]  Pieter Abbeel,et al.  Learning Generalized Reactive Policies using Deep Neural Networks , 2017, ICAPS.

[4]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[5]  Razvan Pascanu,et al.  Progressive Neural Networks , 2016, ArXiv.

[6]  Honglak Lee,et al.  Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.

[7]  Thomas J. Walsh,et al.  Efficient Learning of Action Schemas and Web-Service Descriptions , 2008, AAAI.

[8]  Tom Schaul,et al.  The Predictron: End-To-End Learning and Planning , 2016, ICML.

[9]  L. P. Kaelbling,et al.  Learning Symbolic Models of Stochastic Domains , 2007, J. Artif. Intell. Res..

[10]  J. Tenenbaum,et al.  LEARNING PHYSICAL DYNAMICS , 2017 .

[11]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[12]  Razvan Pascanu,et al.  Imagination-Augmented Agents for Deep Reinforcement Learning , 2017, NIPS.

[13]  Mark Steedman,et al.  Learning STRIPS Operators from Noisy and Incomplete Observations , 2012, UAI.

[14]  Daan Wierstra,et al.  Recurrent Environment Simulators , 2017, ICLR.

[15]  Lexing Xie,et al.  Action Schema Networks: Generalised Policies with Deep Learning , 2017, AAAI.

[16]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 1985 .

[17]  Qiang Yang,et al.  Learning action models from plan examples using weighted MAX-SAT , 2007, Artif. Intell..

[18]  Pieter Abbeel,et al.  Value Iteration Networks , 2016, NIPS.

[19]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[20]  Uri Zwick,et al.  SOKOBAN and other motion planning problems , 1999, Comput. Geom..

[21]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[22]  Malte Helmert,et al.  The Fast Downward Planning System , 2006, J. Artif. Intell. Res..

[23]  A. Barto,et al.  Novelty or Surprise? , 2013, Front. Psychol..

[24]  Michael L. Littman,et al.  An analysis of model-based Interval Estimation for Markov Decision Processes , 2008, J. Comput. Syst. Sci..

[25]  Tom Schaul,et al.  Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.

[26]  Dileep George,et al.  Schema Networks: Zero-shot Transfer with a Generative Causal Model of Intuitive Physics , 2017, ICML.

[27]  Eyal Amir,et al.  Learning Partially Observable Deterministic Action Models , 2005, IJCAI.

[28]  Stephen Cresswell,et al.  Generalised Domain Model Acquisition from Action Traces , 2011, ICAPS.

[29]  Céline Rouveirol,et al.  Incremental Learning of Relational Action Models in Noisy Environments , 2010, ILP.

[30]  Razvan Pascanu,et al.  Interaction Networks for Learning about Objects, Relations and Physics , 2016, NIPS.

[31]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[32]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.