Learning to Explore using Active Neural SLAM

This work presents a modular and hierarchical approach to learn policies for exploring 3D environments, called `Active Neural SLAM'. Our approach leverages the strengths of both classical and learning-based methods, by using analytical path planners with learned SLAM module, and global and local policies. The use of learning provides flexibility with respect to input modalities (in the SLAM module), leverages structural regularities of the world (in global policies), and provides robustness to errors in state estimation (in local policies). Such use of learning within each module retains its benefits, while at the same time, hierarchical decomposition and modular training allow us to sidestep the high sample complexities associated with training end-to-end policies. Our experiments in visually and physically realistic simulated 3D environments demonstrate the effectiveness of our approach over past learning and geometry-based approaches. The proposed model can also be easily transferred to the PointGoal task and was the winning entry of the CVPR 2019 Habitat PointGoal Navigation Challenge.

[1]  Stefan Lee,et al.  Embodied Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[2]  Vijay Kumar,et al.  End-to-End Navigation in Unknown Environments using Neural Networks , 2017, ArXiv.

[3]  Ali Farhadi,et al.  Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[4]  Andrea Vedaldi,et al.  MapNet: An Allocentric Spatial Memory for Mapping Environments , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[6]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[7]  Rahul Sukthankar,et al.  Cognitive Mapping and Planning for Visual Navigation , 2017, International Journal of Computer Vision.

[8]  Basilio Bona,et al.  Active SLAM and Exploration with Particle Filters Using Kullback-Leibler Divergence , 2014, J. Intell. Robotic Syst..

[9]  Wolfram Burgard,et al.  Neural SLAM: Learning to Explore with External Memory , 2017, 1706.09520.

[10]  Jitendra Malik,et al.  Gibson Env: Real-World Perception for Embodied Agents , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  Nicholas Roy,et al.  Trajectory Optimization using Reinforcement Learning for Map Exploration , 2008, Int. J. Robotics Res..

[12]  Stefan Kohlbrecher,et al.  A flexible and scalable SLAM system with full 3D motion estimation , 2011, 2011 IEEE International Symposium on Safety, Security, and Rescue Robotics.

[13]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[14]  Silvio Savarese,et al.  Scene Memory Transformer for Embodied Agents in Long-Horizon Tasks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Alexander Kleiner,et al.  A frontier-void-based approach for autonomous exploration in 3d , 2011, 2011 IEEE International Symposium on Safety, Security, and Rescue Robotics.

[16]  Matthias Nießner,et al.  Matterport3D: Learning from RGB-D Data in Indoor Environments , 2017, 2017 International Conference on 3D Vision (3DV).

[17]  John Canny,et al.  The complexity of robot motion planning , 1988 .

[18]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[19]  Sven Behnke,et al.  Evaluating the Efficiency of Frontier-based Exploration Strategies , 2010, ISR/ROBOTIK.

[20]  Geoffrey E. Hinton,et al.  Feudal Reinforcement Learning , 1992, NIPS.

[21]  Vladlen Koltun,et al.  Semi-parametric Topological Memory for Navigation , 2018, ICLR.

[22]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[23]  MahadevanSridhar,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003 .

[24]  Sebastian Thrun,et al.  Probabilistic robotics , 2002, CACM.

[25]  Matthew R. Walter,et al.  Learning Semantic Maps from Natural Language Descriptions , 2013, Robotics: Science and Systems.

[26]  Brian Yamauchi,et al.  A frontier-based approach for autonomous exploration , 1997, Proceedings 1997 IEEE International Symposium on Computational Intelligence in Robotics and Automation CIRA'97. 'Towards New Computational Principles for Robotics and Automation'.

[27]  Patric Jensfelt,et al.  Large-scale semantic mapping and reasoning with heterogeneous modalities , 2012, 2012 IEEE International Conference on Robotics and Automation.

[28]  B. Faverjon,et al.  Probabilistic Roadmaps for Path Planning in High-Dimensional Con(cid:12)guration Spaces , 1996 .

[29]  Peter Auer,et al.  Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..

[30]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[31]  Jitendra Malik,et al.  On Evaluation of Embodied Navigation Agents , 2018, ArXiv.

[32]  Ruslan Salakhutdinov,et al.  Neural Map: Structured Memory for Deep Reinforcement Learning , 2017, ICLR.

[33]  Jitendra Malik,et al.  Combining Optimal Control and Learning for Visual Navigation in Novel Environments , 2019, CoRL.

[34]  Nando de Freitas,et al.  A Bayesian exploration-exploitation approach for optimal online sensing and planning with a visually guided mobile robot , 2009, Auton. Robots.

[35]  Ali Farhadi,et al.  IQA: Visual Question Answering in Interactive Environments , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[36]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  José Ruíz Ascencio,et al.  Visual simultaneous localization and mapping: a survey , 2012, Artificial Intelligence Review.

[38]  Guillaume Lample,et al.  Arnold: An Autonomous Agent to Play FPS Games , 2017, AAAI.

[39]  Stefan Lee,et al.  Neural Modular Control for Embodied Question Answering , 2018, CoRL.

[40]  Pieter Abbeel,et al.  Value Iteration Networks , 2016, NIPS.

[41]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[42]  Vladlen Koltun,et al.  Beauty and the Beast: Optimal Methods Meet Learning for Drone Racing , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[43]  Eric P. Xing,et al.  Gated Path Planning Networks , 2018, ICML.

[44]  Matthias Nießner,et al.  Autonomous reconstruction of unknown indoor scenes guided by time-varying tensor fields , 2017, ACM Trans. Graph..

[45]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[46]  Abhinav Gupta,et al.  PyRobot: An Open-source Robotics Framework for Research and Benchmarking , 2019, ArXiv.

[47]  Steven M. LaValle,et al.  Rapidly-Exploring Random Trees: Progress and Prospects , 2000 .

[48]  Ruslan Salakhutdinov,et al.  Active Neural Localization , 2018, ICLR.

[49]  Andrew W. Fitzgibbon,et al.  KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera , 2011, UIST.

[50]  Juan D. Tardós,et al.  ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras , 2016, IEEE Transactions on Robotics.

[51]  Guillaume Lample,et al.  Playing FPS Games with Deep Reinforcement Learning , 2016, AAAI.

[52]  Michael Kearns,et al.  Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.

[53]  Devendra Singh Chaplot Transfer Deep Reinforcement Learning in 3 D Environments : An Empirical Study , 2016 .

[54]  Razvan Pascanu,et al.  Learning to Navigate in Complex Environments , 2016, ICLR.

[55]  Tao Chen,et al.  Learning Exploration Policies for Navigation , 2019, ICLR.

[56]  Wolfram Burgard,et al.  Information Gain-based Exploration Using Rao-Blackwellized Particle Filters , 2005, Robotics: Science and Systems.

[57]  Marc Pollefeys,et al.  Episodic Curiosity through Reachability , 2018, ICLR.

[58]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[59]  Jitendra Malik,et al.  Habitat: A Platform for Embodied AI Research , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[60]  Thomas A. Funkhouser,et al.  Semantic Scene Completion from a Single Depth Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[61]  Andrew J. Davison,et al.  DTAM: Dense tracking and mapping in real-time , 2011, 2011 International Conference on Computer Vision.

[62]  Morgan Quigley,et al.  ROS: an open-source Robot Operating System , 2009, ICRA 2009.

[63]  J A Sethian,et al.  A fast marching level set method for monotonically advancing fronts. , 1996, Proceedings of the National Academy of Sciences of the United States of America.