Regression Planning Networks

Recent learning-to-plan methods have shown promising results on planning directly from observation space. Yet, their ability to plan for long-horizon tasks is limited by the accuracy of the prediction model. On the other hand, classical symbolic planners show remarkable capabilities in solving long-horizon tasks, but they require predefined symbolic rules and symbolic states, restricting their real-world applicability. In this work, we combine the benefits of these two paradigms and propose a learning-to-plan method that can directly generate a long-term symbolic plan conditioned on high-dimensional observations. We borrow the idea of regression (backward) planning from classical planning literature and introduce Regression Planning Networks (RPN), a neural network architecture that plans backward starting at a task goal and generates a sequence of intermediate goals that reaches the current observation. We show that our model not only inherits many favorable traits from symbolic planning --including the ability to solve previously unseen tasks-- but also can learn from visual inputs in an end-to-end manner. We evaluate the capabilities of RPN in a grid world environment and a simulated 3D kitchen environment featuring complex visual scenes and long task horizon, and show that it achieves near-optimal performance in completely new task instances.

[1]  Jitendra Malik,et al.  Learning to Poke by Poking: Experiential Learning of Intuitive Physics , 2016, NIPS.

[2]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[3]  Richard Waldinger,et al.  Achieving several goals simultaneously , 1977 .

[4]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[5]  Russell H. Taylor,et al.  Automatic Synthesis of Fine-Motion Strategies for Robots , 1984 .

[6]  Leslie Pack Kaelbling,et al.  Learning Quickly to Plan Quickly Using Modular Meta-Learning , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[7]  Alex S. Fukunaga,et al.  Classical Planning in Deep Latent Space: Bridging the Subsymbolic-Symbolic Boundary , 2017, AAAI.

[8]  Danfei Xu,et al.  Scene Graph Generation by Iterative Message Passing , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Craig A. Knoblock,et al.  PDDL-the planning domain definition language , 1998 .

[10]  Richard Fikes,et al.  STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving , 1971, IJCAI.

[11]  Sergey Levine,et al.  Deep visual foresight for planning robot motion , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[12]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[13]  Allan Jabri,et al.  Universal Planning Networks: Learning Generalizable Representations for Visuomotor Control , 2018, ICML.

[14]  Leslie Pack Kaelbling,et al.  Pre-image Backchaining in Belief Space for Mobile Manipulation , 2011, ISRR.

[15]  Leslie Pack Kaelbling,et al.  Hierarchical task and motion planning in the now , 2011, 2011 IEEE International Conference on Robotics and Automation.

[16]  S. LaValle Rapidly-exploring random trees : a new tool for path planning , 1998 .

[17]  Sergey Levine,et al.  Reasoning About Physical Interactions with Object-Oriented Prediction and Planning , 2018, ICLR.

[18]  Leslie Pack Kaelbling,et al.  Integrated task and motion planning in belief space , 2013, Int. J. Robotics Res..

[19]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[20]  Honglak Lee,et al.  Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning , 2017, ICML.

[21]  Ruben Villegas,et al.  Learning Latent Dynamics for Planning from Pixels , 2018, ICML.

[22]  Nando de Freitas,et al.  Neural Programmer-Interpreters , 2015, ICLR.

[23]  Honglak Lee,et al.  Hierarchical Reinforcement Learning for Zero-shot Generalization with Subtask Dependencies , 2018, NeurIPS.

[24]  Claire Cardie,et al.  Deep Recursive Neural Networks for Compositionality in Language , 2014, NIPS.

[25]  Pieter Abbeel,et al.  Learning Plannable Representations with Causal InfoGAN , 2018, NeurIPS.

[26]  Honglak Lee,et al.  Multitask Reinforcement Learning for Zero-shot Generalization with Subtask Dependencies , 2018, NIPS 2018.

[27]  Daniel S. Weld An Introduction to Least Commitment Planning , 1994, AI Mag..

[28]  Jitendra Malik,et al.  Zero-Shot Visual Imitation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[29]  Martin A. Riedmiller,et al.  Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.

[30]  E. A. Akkoyunlu,et al.  The Enumeration of Maximal Cliques of Large Graphs , 1973, SIAM J. Comput..

[31]  Dawn Xiaodong Song,et al.  Making Neural Programming Architectures Generalize via Recursion , 2017, ICLR.

[32]  Wulfram Gerstner,et al.  Efficient Model-Based Deep Reinforcement Learning with Variational State Tabulation , 2018, ICML.

[33]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[34]  Jiajun Wu,et al.  Learning to See Physics via Visual De-animation , 2017, NIPS.

[35]  Honglak Lee,et al.  Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.

[36]  Richard E. Korf,et al.  Planning as Search: A Quantitative Approach , 1987, Artif. Intell..

[37]  Cordelia Schmid,et al.  AVA: A Video Dataset of Spatio-Temporally Localized Atomic Visual Actions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  Dan Klein,et al.  Modular Multitask Reinforcement Learning with Policy Sketches , 2016, ICML.

[39]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[40]  Trevor Darrell,et al.  Deep Object-Centric Policies for Autonomous Driving , 2018, 2019 International Conference on Robotics and Automation (ICRA).