论文信息 - Neural Task Programming: Learning to Generalize Across Hierarchical Tasks

Neural Task Programming: Learning to Generalize Across Hierarchical Tasks

In this work, we propose a novel robot learning framework called Neural Task Programming (NTP), which bridges the idea of few-shot learning from demonstration and neural program induction. NTP takes as input a task specification (e.g., video demonstration of a task) and recursively decomposes it into finer sub-task specifications. These specifications are fed to a hierarchical neural program, where bottom-level programs are callable subroutines that interact with the environment. We validate our method in three robot manipulation tasks. NTP achieves strong generalization across sequential tasks that exhibit hierarchal and compositional structures. The experimental results show that NTP learns to generalize well towards unseen tasks with increasing lengths, variable topologies, and changing objectives.stanfordvl.github.io/ntp/.

[1] Richard Fikes,et al. Learning and Executing Generalized Robot Plans , 1993, Artif. Intell..

[2] Rodney A. Brooks,et al. A Robust Layered Control Syste For A Mobile Robot , 2022 .

[3] C. Lee Giles,et al. Learning and Extracting Finite State Automata with Second-Order Recurrent Neural Networks , 1992, Neural Computation.

[4] D. Pomerleau,et al. MANIAC : A Next Generation Neurally Based Autonomous Road Follower , 1993 .

[5] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[6] Stuart J. Russell,et al. Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[7] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[8] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[9] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[10] Ricardo Vilalta,et al. A Perspective View and Survey of Meta-Learning , 2002, Artificial Intelligence Review.

[11] Pietro Perona,et al. One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12] Brett Browning,et al. A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[13] Henk Nijmeijer,et al. Robot Programming by Demonstration , 2010, SIMPAR.

[14] Yiannis Demiris,et al. Towards One Shot Learning by imitation for humanoid robots , 2010, 2010 IEEE International Conference on Robotics and Automation.

[15] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[16] Bart Selman,et al. Learning Sequences of Controllers for Complex Manipulation Tasks , 2013, ArXiv.

[17] Jan Peters,et al. Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[18] Alan Fern,et al. Imitation Learning with Demonstrations and Shaping Rewards , 2014, AAAI.

[19] Alex Graves,et al. Neural Turing Machines , 2014, ArXiv.

[20] Tom Schaul,et al. Universal Value Function Approximators , 2015, ICML.

[21] Pieter Abbeel,et al. Learning by observation for surgical subtasks: Multilateral cutting of 3D viscoelastic and 2D Orthotropic Tissue Phantoms , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[22] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[23] Dan Klein,et al. Neural Module Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Marcin Andrychowicz,et al. Learning to learn by gradient descent by gradient descent , 2016, NIPS.

[25] Lukasz Kaiser,et al. Neural GPUs Learn Algorithms , 2015, ICLR.

[26] Peter L. Bartlett,et al. RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.

[27] Oriol Vinyals,et al. Matching Networks for One Shot Learning , 2016, NIPS.

[28] Joshua B. Tenenbaum,et al. Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.

[29] Ion Stoica,et al. Multi-Level Discovery of Deep Options , 2017, ArXiv.

[30] Doina Precup,et al. The Option-Critic Architecture , 2016, AAAI.

[31] Dan Klein,et al. Modular Multitask Reinforcement Learning with Policy Sketches , 2016, ICML.

[32] Shie Mannor,et al. A Deep Hierarchical Approach to Lifelong Learning in Minecraft , 2016, AAAI.

[33] Pushmeet Kohli,et al. RobustFill: Neural Program Learning under Noisy I/O , 2017, ICML.

[34] Vladlen Koltun,et al. Learning to Act by Predicting the Future , 2016, ICLR.

[35] Marcin Andrychowicz,et al. One-Shot Imitation Learning , 2017, NIPS.

[36] Gregory D. Hager,et al. Transition state clustering: Unsupervised surgical trajectory segmentation for robot learning , 2017, ISRR.

[37] Dawn Xiaodong Song,et al. Making Neural Programming Architectures Generalize via Recursion , 2017, ICLR.

[38] Ali Farhadi,et al. Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[39] Sergey Levine,et al. Learning modular neural network policies for multi-task and multi-robot transfer , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).