Learning Navigation Subroutines by Watching Videos

Hierarchies are an effective way to boost sample efficiency in reinforcement learning, and computational efficiency in classical planning. However, acquiring hierarchies via hand-design (as in classical planning) is suboptimal, while acquiring them via end-to-end reward based training (as in reinforcement learning) is unstable and still prohibitively expensive. In this paper, we pursue an alternate paradigm for acquiring such hierarchical abstractions (or visuo-motor subroutines), via use of passive first person observation data. We use an inverse model trained on small amounts of interaction data to pseudo-label the passive first person videos with agent actions. Visuo-motor subroutines are acquired from these pseudo-labeled videos by learning a latent intent-conditioned policy that predicts the inferred pseudo-actions from the corresponding image observations. We demonstrate our proposed approach in context of navigation, and show that we can successfully learn consistent and diverse visuo-motor subroutines from passive first-person videos. We demonstrate the utility of our acquired visuo-motor subroutines by using them as is for exploration, and as sub-policies in a hierarchical RL framework for reaching point goals and semantic goals. We also demonstrate behavior of our subroutines in the real world, by deploying them on a real robotic platform. Project website with videos, code and data: this https URL.

[1]  Richard Fikes,et al.  STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving , 1971, IJCAI.

[2]  Henk Nijmeijer,et al.  Robot Programming by Demonstration , 2010, SIMPAR.

[3]  Jun Nakanishi,et al.  Dynamical Movement Primitives: Learning Attractor Models for Motor Behaviors , 2013, Neural Computation.

[4]  Alberto Elfes,et al.  Using occupancy grids for mobile robot perception and navigation , 1989, Computer.

[5]  Ramesh Raskar,et al.  Deep Visual Teach and Repeat on Path Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[6]  Yannick Schroecker,et al.  Imitating Latent Policies from Observation , 2018, ICML.

[7]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[8]  Sergey Levine,et al.  Diversity is All You Need: Learning Skills without a Reward Function , 2018, ICLR.

[9]  Timothy Bretl,et al.  Using Motion Primitives in Probabilistic Sample-Based Planning for Humanoid Robots , 2008, WAFR.

[10]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[11]  Matthias Nießner,et al.  Matterport3D: Learning from RGB-D Data in Indoor Environments , 2017, 2017 International Conference on 3D Vision (3DV).

[12]  John Canny,et al.  The complexity of robot motion planning , 1988 .

[13]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[14]  Sebastian Thrun,et al.  Probabilistic robotics , 2002, CACM.

[15]  B. Faverjon,et al.  Probabilistic Roadmaps for Path Planning in High-Dimensional Con(cid:12)guration Spaces , 1996 .

[16]  Geoffrey E. Hinton,et al.  Feudal Reinforcement Learning , 1992, NIPS.

[17]  Silvio Savarese,et al.  3D Semantic Parsing of Large-Scale Indoor Spaces , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Alexei A. Efros,et al.  People Watching: Human Actions as a Cue for Single View Geometry , 2012, International Journal of Computer Vision.

[19]  Razvan Pascanu,et al.  Learning to Navigate in Complex Environments , 2016, ICLR.

[20]  Nando de Freitas,et al.  Playing hard exploration games by watching YouTube , 2018, NeurIPS.

[21]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[22]  Jitendra Malik,et al.  On Evaluation of Embodied Navigation Agents , 2018, ArXiv.

[23]  Jitendra Malik,et al.  Visual Memory for Robust Path Following , 2018, NeurIPS.

[24]  Gaurav S. Sukhatme,et al.  Multi-Modal Imitation Learning from Unstructured Demonstrations using Generative Adversarial Nets , 2017, NIPS.

[25]  Steven M. LaValle,et al.  Planning algorithms , 2006 .

[26]  Peter Stone,et al.  Behavioral Cloning from Observation , 2018, IJCAI.

[27]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[28]  Yoshua Bengio,et al.  Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.

[29]  S. Schaal Dynamic Movement Primitives -A Framework for Motor Control in Humans and Humanoid Robotics , 2006 .

[30]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[31]  Tom Schaul,et al.  FeUdal Networks for Hierarchical Reinforcement Learning , 2017, ICML.

[32]  Rahul Sukthankar,et al.  Cognitive Mapping and Planning for Visual Navigation , 2017, International Journal of Computer Vision.

[33]  Ali Farhadi,et al.  Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[34]  Justin Fu,et al.  EX2: Exploration with Exemplar Models for Deep Reinforcement Learning , 2017, NIPS.

[35]  Kate Saenko,et al.  Hierarchical Actor-Critic , 2017, ArXiv.