Learning to Visually Navigate in Photorealistic Environments Without any Supervision

Learning to navigate in a realistic setting where an agent must rely solely on visual inputs is a challenging task, in part because the lack of position information makes it difficult to provide supervision during training. In this paper, we introduce a novel approach for learning to navigate from image inputs without external supervision or reward. Our approach consists of three stages: learning a good representation of first-person views, then learning to explore using memory, and finally learning to navigate by setting its own goals. The model is trained with intrinsic rewards only so that it can be applied to any environment with image observations. We show the benefits of our approach by training an agent to navigate challenging photo-realistic environments from the Gibson dataset with RGB inputs only.

[1]  Rob Fergus,et al.  Composable Planning with Attributes , 2018, ICML.

[2]  Jitendra Malik,et al.  Visual Memory for Robust Path Following , 2018, NeurIPS.

[3]  Jitendra Malik,et al.  Gibson Env: Real-World Perception for Embodied Agents , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4]  Tom Drummond,et al.  EMPNet: Neural Localisation and Mapping Using Embedded Memory Points , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[5]  Honglak Lee,et al.  Control of Memory, Active Perception, and Action in Minecraft , 2016, ICML.

[6]  Razvan Pascanu,et al.  Learning to Navigate in Complex Environments , 2016, ICLR.

[7]  Germán Ros,et al.  CARLA: An Open Urban Driving Simulator , 2017, CoRL.

[8]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[9]  Silvio Savarese,et al.  Scene Memory Transformer for Embodied Agents in Long-Horizon Tasks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Jürgen Schmidhuber,et al.  Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[11]  Alexei A. Efros,et al.  Colorful Image Colorization , 2016, ECCV.

[12]  Rahul Sukthankar,et al.  Cognitive Mapping and Planning for Visual Navigation , 2017, International Journal of Computer Vision.

[13]  Sebastian Thrun,et al.  Probabilistic robotics , 2002, CACM.

[14]  Jitendra Malik,et al.  Unifying Map and Landmark Based Representations for Visual Navigation , 2017, ArXiv.

[15]  Matthias Nießner,et al.  Matterport3D: Learning from RGB-D Data in Indoor Environments , 2017, 2017 International Conference on 3D Vision (3DV).

[16]  Tao Chen,et al.  Learning Exploration Policies for Navigation , 2019, ICLR.

[17]  J. M. M. Montiel,et al.  ORB-SLAM: A Versatile and Accurate Monocular SLAM System , 2015, IEEE Transactions on Robotics.

[18]  Andrea Vedaldi,et al.  MapNet: An Allocentric Spatial Memory for Mapping Environments , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Raia Hadsell,et al.  Learning to Navigate in Cities Without a Map , 2018, NeurIPS.

[20]  Marc Pollefeys,et al.  Episodic Curiosity through Reachability , 2018, ICLR.

[21]  Matthijs Douze,et al.  Deep Clustering for Unsupervised Learning of Visual Features , 2018, ECCV.

[22]  Sergey Levine,et al.  Search on the Replay Buffer: Bridging Planning and Reinforcement Learning , 2019, NeurIPS.

[23]  Jürgen Schmidhuber,et al.  Recurrent policy gradients , 2010, Log. J. IGPL.

[24]  Sergey Levine,et al.  (CAD)$^2$RL: Real Single-Image Flight without a Single Real Image , 2016, Robotics: Science and Systems.

[25]  Ruslan Salakhutdinov,et al.  Neural Map: Structured Memory for Deep Reinforcement Learning , 2017, ICLR.

[26]  Jitendra Malik,et al.  Habitat: A Platform for Embodied AI Research , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[27]  Ali Farhadi,et al.  Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[28]  Vladlen Koltun,et al.  Semi-parametric Topological Memory for Navigation , 2018, ICLR.

[29]  Wojciech Zaremba,et al.  Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[30]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[31]  Wolfram Burgard,et al.  Neural SLAM: Learning to Explore with External Memory , 2017, 1706.09520.

[32]  Alexei A. Efros,et al.  Unsupervised Visual Representation Learning by Context Prediction , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[33]  Vijay Kumar,et al.  Memory Augmented Control Networks , 2017, ICLR.

[34]  Jitendra Malik,et al.  On Evaluation of Embodied Navigation Agents , 2018, ArXiv.

[35]  Nuttapong Chentanez,et al.  Intrinsically Motivated Reinforcement Learning , 2004, NIPS.

[36]  Ali Farhadi,et al.  Visual Semantic Navigation using Scene Priors , 2018, ICLR.

[37]  Abhinav Gupta,et al.  Scaling and Benchmarking Self-Supervised Visual Representation Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[38]  Ruslan Salakhutdinov,et al.  Active Neural Localization , 2018, ICLR.