论文信息 - Learning Navigation Subroutines by Watching Videos

Learning Navigation Subroutines by Watching Videos

Hierarchies are an effective way to boost sample efficiency in reinforcement learning, and computational efficiency in classical planning. However, acquiring hierarchies via hand-design (as in classical planning) is suboptimal, while acquiring them via end-to-end reward based training (as in reinforcement learning) is unstable and still prohibitively expensive. In this paper, we pursue an alternate paradigm for acquiring such hierarchical abstractions (or visuo-motor subroutines), via use of passive first person observation data. We use an inverse model trained on small amounts of interaction data to pseudo-label the passive first person videos with agent actions. Visuo-motor subroutines are acquired from these pseudo-labeled videos by learning a latent intent-conditioned policy that predicts the inferred pseudo-actions from the corresponding image observations. We demonstrate our proposed approach in context of navigation, and show that we can successfully learn consistent and diverse visuo-motor subroutines from passive first-person videos. We demonstrate the utility of our acquired visuo-motor subroutines by using them as is for exploration, and as sub-policies in a hierarchical RL framework for reaching point goals and semantic goals. We also demonstrate behavior of our subroutines in the real world, by deploying them on a real robotic platform. Project website with videos, code and data: this https URL.

[1] Richard Fikes,et al. STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving , 1971, IJCAI.

[2] Henk Nijmeijer,et al. Robot Programming by Demonstration , 2010, SIMPAR.

[3] Jun Nakanishi,et al. Dynamical Movement Primitives: Learning Attractor Models for Motor Behaviors , 2013, Neural Computation.

[4] Alberto Elfes,et al. Using occupancy grids for mobile robot perception and navigation , 1989, Computer.

[5] Ramesh Raskar,et al. Deep Visual Teach and Repeat on Path Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[6] Yannick Schroecker,et al. Imitating Latent Policies from Observation , 2018, ICML.

[7] Sridhar Mahadevan,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[8] Sergey Levine,et al. Diversity is All You Need: Learning Skills without a Reward Function , 2018, ICLR.

[9] Timothy Bretl,et al. Using Motion Primitives in Probabilistic Sample-Based Planning for Humanoid Robots , 2008, WAFR.

[10] Ben Poole,et al. Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[11] Matthias Nießner,et al. Matterport3D: Learning from RGB-D Data in Indoor Environments , 2017, 2017 International Conference on 3D Vision (3DV).

[12] John Canny,et al. The complexity of robot motion planning , 1988 .

[13] Brett Browning,et al. A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[14] Sebastian Thrun,et al. Probabilistic robotics , 2002, CACM.

[15] B. Faverjon,et al. Probabilistic Roadmaps for Path Planning in High-Dimensional Con(cid:12)guration Spaces , 1996 .

[16] Geoffrey E. Hinton,et al. Feudal Reinforcement Learning , 1992, NIPS.

[17] Silvio Savarese,et al. 3D Semantic Parsing of Large-Scale Indoor Spaces , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18] Alexei A. Efros,et al. People Watching: Human Actions as a Cue for Single View Geometry , 2012, International Journal of Computer Vision.

[19] Razvan Pascanu,et al. Learning to Navigate in Complex Environments , 2016, ICLR.

[20] Nando de Freitas,et al. Playing hard exploration games by watching YouTube , 2018, NeurIPS.

[21] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[22] Jitendra Malik,et al. On Evaluation of Embodied Navigation Agents , 2018, ArXiv.

[23] Jitendra Malik,et al. Visual Memory for Robust Path Following , 2018, NeurIPS.

[24] Gaurav S. Sukhatme,et al. Multi-Modal Imitation Learning from Unstructured Demonstrations using Generative Adversarial Nets , 2017, NIPS.

[25] Steven M. LaValle,et al. Planning algorithms , 2006 .

[26] Peter Stone,et al. Behavioral Cloning from Observation , 2018, IJCAI.

[27] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[28] Yoshua Bengio,et al. Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.

[29] S. Schaal. Dynamic Movement Primitives -A Framework for Motor Control in Humans and Humanoid Robotics , 2006 .

[30] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[31] Tom Schaul,et al. FeUdal Networks for Hierarchical Reinforcement Learning , 2017, ICML.

[32] Rahul Sukthankar,et al. Cognitive Mapping and Planning for Visual Navigation , 2017, International Journal of Computer Vision.

[33] Ali Farhadi,et al. Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[34] Justin Fu,et al. EX2: Exploration with Exemplar Models for Deep Reinforcement Learning , 2017, NIPS.

[35] Kate Saenko,et al. Hierarchical Actor-Critic , 2017, ArXiv.