论文信息 - Regression Planning Networks

Regression Planning Networks

Recent learning-to-plan methods have shown promising results on planning directly from observation space. Yet, their ability to plan for long-horizon tasks is limited by the accuracy of the prediction model. On the other hand, classical symbolic planners show remarkable capabilities in solving long-horizon tasks, but they require predefined symbolic rules and symbolic states, restricting their real-world applicability. In this work, we combine the benefits of these two paradigms and propose a learning-to-plan method that can directly generate a long-term symbolic plan conditioned on high-dimensional observations. We borrow the idea of regression (backward) planning from classical planning literature and introduce Regression Planning Networks (RPN), a neural network architecture that plans backward starting at a task goal and generates a sequence of intermediate goals that reaches the current observation. We show that our model not only inherits many favorable traits from symbolic planning --including the ability to solve previously unseen tasks-- but also can learn from visual inputs in an end-to-end manner. We evaluate the capabilities of RPN in a grid world environment and a simulated 3D kitchen environment featuring complex visual scenes and long task horizon, and show that it achieves near-optimal performance in completely new task instances.

[1] Jitendra Malik,et al. Learning to Poke by Poking: Experiential Learning of Intuitive Physics , 2016, NIPS.

[2] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[3] Richard Waldinger,et al. Achieving several goals simultaneously , 1977 .

[4] Christopher Potts,et al. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[5] Russell H. Taylor,et al. Automatic Synthesis of Fine-Motion Strategies for Robots , 1984 .

[6] Leslie Pack Kaelbling,et al. Learning Quickly to Plan Quickly Using Modular Meta-Learning , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[7] Alex S. Fukunaga,et al. Classical Planning in Deep Latent Space: Bridging the Subsymbolic-Symbolic Boundary , 2017, AAAI.

[8] Danfei Xu,et al. Scene Graph Generation by Iterative Message Passing , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Craig A. Knoblock,et al. PDDL-the planning domain definition language , 1998 .

[10] Richard Fikes,et al. STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving , 1971, IJCAI.

[11] Sergey Levine,et al. Deep visual foresight for planning robot motion , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[12] Christopher Burgess,et al. beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[13] Allan Jabri,et al. Universal Planning Networks: Learning Generalizable Representations for Visuomotor Control , 2018, ICML.

[14] Leslie Pack Kaelbling,et al. Pre-image Backchaining in Belief Space for Mobile Manipulation , 2011, ISRR.

[15] Leslie Pack Kaelbling,et al. Hierarchical task and motion planning in the now , 2011, 2011 IEEE International Conference on Robotics and Automation.

[16] S. LaValle. Rapidly-exploring random trees : a new tool for path planning , 1998 .

[17] Sergey Levine,et al. Reasoning About Physical Interactions with Object-Oriented Prediction and Planning , 2018, ICLR.

[18] Leslie Pack Kaelbling,et al. Integrated task and motion planning in belief space , 2013, Int. J. Robotics Res..

[19] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .

[20] Honglak Lee,et al. Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning , 2017, ICML.

[21] Ruben Villegas,et al. Learning Latent Dynamics for Planning from Pixels , 2018, ICML.

[22] Nando de Freitas,et al. Neural Programmer-Interpreters , 2015, ICLR.

[23] Honglak Lee,et al. Hierarchical Reinforcement Learning for Zero-shot Generalization with Subtask Dependencies , 2018, NeurIPS.