Planning with Goal-Conditioned Policies

Planning methods can solve temporally extended sequential decision making problems by composing simple behaviors. However, planning requires suitable abstractions for the states and transitions, which typically need to be designed by hand. In contrast, reinforcement learning (RL) can acquire behaviors from low-level inputs directly, but struggles with temporally extended tasks. Can we utilize reinforcement learning to automatically form the abstractions needed for planning, thus obtaining the best of both approaches? We show that goal-conditioned policies learned with RL can be incorporated into planning, such that a planner can focus on which states to reach, rather than how those states are reached. However, with complex state observations such as images, not all inputs represent valid states. We therefore also propose using a latent variable model to compactly represent the set of valid states for the planner, such that the policies provide an abstraction of actions, and the latent variable model provides an abstraction of states. We compare our method with planning-based and model-free methods and find that our method significantly outperforms prior work when evaluated on image-based tasks that require non-greedy, multi-staged behavior.

[1]  Sergey Levine,et al.  Data-Efficient Hierarchical Reinforcement Learning , 2018, NeurIPS.

[2]  Sergey Levine,et al.  Model-Based Reinforcement Learning for Atari , 2019, ICLR.

[3]  Shie Mannor,et al.  A Tutorial on the Cross-Entropy Method , 2005, Ann. Oper. Res..

[4]  Kate Saenko,et al.  Learning Multi-Level Hierarchies with Hindsight , 2017, ICLR.

[5]  Tom Schaul,et al.  Universal Value Function Approximators , 2015, ICML.

[6]  Sergey Levine,et al.  Temporal Difference Models: Model-Free Deep RL for Model-Based Control , 2018, ICLR.

[7]  Razvan Pascanu,et al.  Imagination-Augmented Agents for Deep Reinforcement Learning , 2017, NIPS.

[8]  Derrick H. Nguyen,et al.  Neural networks for self-learning control systems , 1990 .

[9]  Alex Graves,et al.  Video Pixel Networks , 2016, ICML.

[10]  Honglak Lee,et al.  Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.

[11]  Ruben Villegas,et al.  Learning Latent Dynamics for Planning from Pixels , 2018, ICML.

[12]  Stephen P. Boyd,et al.  Proximal Algorithms , 2013, Found. Trends Optim..

[13]  Andrew W. Moore,et al.  Multi-Value-Functions: Efficient Automatic Action Hierarchies for Multiple Goal MDPs , 1999, IJCAI.

[14]  Vitaly Levdik,et al.  Q-map: a Convolutional Approach for Goal-Oriented Reinforcement Learning , 2018, ArXiv.

[15]  Marcin Andrychowicz,et al.  Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research , 2018, ArXiv.

[16]  Sergey Levine,et al.  Deep visual foresight for planning robot motion , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[17]  Daan Wierstra,et al.  Recurrent Environment Simulators , 2017, ICLR.

[18]  David P. Wipf,et al.  Diagnosing and Enhancing VAE Models , 2019, ICLR.

[19]  Jürgen Schmidhuber,et al.  HQ-Learning , 1997, Adapt. Behav..

[20]  Leslie Pack Kaelbling,et al.  Hierarchical Learning in Stochastic Domains: Preliminary Results , 1993, ICML.

[21]  Ross A. Knepper,et al.  DeepMPC: Learning Deep Latent Features for Model Predictive Control , 2015, Robotics: Science and Systems.

[22]  Pieter Abbeel,et al.  Learning Plannable Representations with Causal InfoGAN , 2018, NeurIPS.

[23]  Satinder Singh,et al.  Value Prediction Network , 2017, NIPS.

[24]  Sergey Levine,et al.  Skew-Fit: State-Covering Self-Supervised Reinforcement Learning , 2019, ICML.

[25]  Sergey Levine,et al.  Stochastic Adversarial Video Prediction , 2018, ArXiv.

[26]  B. Widrow,et al.  Neural networks for self-learning control systems , 1990, IEEE Control Systems Magazine.

[27]  Tom Schaul,et al.  FeUdal Networks for Hierarchical Reinforcement Learning , 2017, ICML.

[28]  Herke van Hoof,et al.  Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[29]  Stephen J. Wright,et al.  Numerical Optimization , 2018, Fundamental Statistical Inference.

[30]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[31]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[32]  Yann LeCun,et al.  Deep multi-scale video prediction beyond mean square error , 2015, ICLR.

[33]  Sergey Levine,et al.  Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , 2018, NeurIPS.

[34]  Byron Boots,et al.  Learning predictive models of a depth camera & manipulator from raw execution traces , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[35]  Sergey Levine,et al.  Self-Supervised Visual Planning with Temporal Skip Connections , 2017, CoRL.

[36]  Geoffrey E. Hinton,et al.  Feudal Reinforcement Learning , 1992, NIPS.

[37]  Franziska Meier,et al.  SE3-Pose-Nets: Structured Deep Dynamics Models for Visuomotor Planning and Control , 2017, ArXiv.

[38]  David Warde-Farley,et al.  Unsupervised Control Through Non-Parametric Discriminative Rewards , 2018, ICLR.

[39]  Yuval Tassa,et al.  Learning Continuous Control Policies by Stochastic Value Gradients , 2015, NIPS.

[40]  Sergey Levine,et al.  Deep spatial autoencoders for visuomotor learning , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[41]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[42]  Pierre-Yves Oudeyer,et al.  CURIOUS: Intrinsically Motivated Multi-Task, Multi-Goal Reinforcement Learning , 2018, ICML 2019.

[43]  Navdeep Jaitly,et al.  Adversarial Autoencoders , 2015, ArXiv.

[44]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[45]  Sergey Levine,et al.  Visual Reinforcement Learning with Imagined Goals , 2018, NeurIPS.

[46]  Pieter Abbeel,et al.  Deep learning helicopter dynamics models , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[47]  Marcin Andrychowicz,et al.  Hindsight Experience Replay , 2017, NIPS.

[48]  Sergey Levine,et al.  SOLAR: Deep Structured Latent Representations for Model-Based Reinforcement Learning , 2018, ArXiv.

[49]  Satinder Singh,et al.  Many-Goals Reinforcement Learning , 2018, ArXiv.

[50]  Peter Dayan,et al.  Structure in the Space of Value Functions , 2002, Machine Learning.

[51]  L. Kaelbling,et al.  Toward Hierachical Decomposition for Planning in Uncertain Environments , 2001 .

[52]  Arpit Agarwal,et al.  Model Learning for Look-ahead Exploration in Continuous Control , 2018, AAAI.

[53]  Leslie Pack Kaelbling,et al.  Learning to Achieve Goals , 1993, IJCAI.

[54]  Sergey Levine,et al.  Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[55]  Sergey Levine,et al.  Learning Actionable Representations with Goal-Conditioned Policies , 2018, ICLR.

[56]  Tom Eccles,et al.  An investigation of model-free planning , 2019, ICML.

[57]  Sergey Levine,et al.  Stochastic Variational Video Prediction , 2017, ICLR.

[58]  Patrick M. Pilarski,et al.  Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.

[59]  Pieter Abbeel,et al.  Value Iteration Networks , 2016, NIPS.

[60]  Sergey Levine,et al.  Visual Foresight: Model-Based Deep Reinforcement Learning for Vision-Based Robotic Control , 2018, ArXiv.

[61]  Pieter Abbeel,et al.  Automatic Goal Generation for Reinforcement Learning Agents , 2017, ICML.

[62]  Yeshaiahu Fainman,et al.  Image manifolds , 1998, Electronic Imaging.

[63]  Sergey Levine,et al.  SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning , 2018, ICML.

[64]  Martin A. Riedmiller,et al.  Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.