Learning To Explore Using Active Neural Mapping

This work presents a modular and hierarchical approach to learn policies for exploring 3D environments. Our approach leverages the strengths of both classical and learning-based methods, by using analytical path planners with learned mappers, and global and local policies. Use of learning provides flexibility with respect to input modalities (in mapper), leverages structural regularities of the world (in global policies), and provides robustness to errors in state estimation (in local policies). Such use of learning within each module retains its benefits, while at the same time, hierarchical decomposition and modular training allow us to sidestep the high sample complexities associated with training end-to-end policies. Our experiments in visually and physically realistic simulated 3D environments demonstrate the effectiveness of our proposed approach over past learning and geometry-based approaches.

[1]  Matthias Nießner,et al.  Matterport3D: Learning from RGB-D Data in Indoor Environments , 2017, 2017 International Conference on 3D Vision (3DV).

[2]  John Canny,et al.  The complexity of robot motion planning , 1988 .

[3]  B. Faverjon,et al.  Probabilistic Roadmaps for Path Planning in High-Dimensional Con(cid:12)guration Spaces , 1996 .

[4]  José Ruíz Ascencio,et al.  Visual simultaneous localization and mapping: a survey , 2012, Artificial Intelligence Review.

[5]  Nando de Freitas,et al.  A Bayesian exploration-exploitation approach for optimal online sensing and planning with a visually guided mobile robot , 2009, Auton. Robots.

[6]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[7]  Brian Yamauchi,et al.  A frontier-based approach for autonomous exploration , 1997, Proceedings 1997 IEEE International Symposium on Computational Intelligence in Robotics and Automation CIRA'97. 'Towards New Computational Principles for Robotics and Automation'.

[8]  Patric Jensfelt,et al.  Large-scale semantic mapping and reasoning with heterogeneous modalities , 2012, 2012 IEEE International Conference on Robotics and Automation.

[9]  Andrew J. Davison,et al.  DTAM: Dense tracking and mapping in real-time , 2011, 2011 International Conference on Computer Vision.

[10]  Vladlen Koltun,et al.  Semi-parametric Topological Memory for Navigation , 2018, ICLR.

[11]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[12]  Jitendra Malik,et al.  Habitat: A Platform for Embodied AI Research , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[13]  Rahul Sukthankar,et al.  Cognitive Mapping and Planning for Visual Navigation , 2017, International Journal of Computer Vision.

[14]  Stefan Lee,et al.  Embodied Question Answering , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[16]  Marc Pollefeys,et al.  Episodic Curiosity through Reachability , 2018, ICLR.

[17]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[18]  Matthias Nießner,et al.  Autonomous reconstruction of unknown indoor scenes guided by time-varying tensor fields , 2017, ACM Trans. Graph..

[19]  Wolfram Burgard,et al.  Neural SLAM: Learning to Explore with External Memory , 2017, 1706.09520.

[20]  Nicholas Roy,et al.  Trajectory Optimization using Reinforcement Learning for Map Exploration , 2008, Int. J. Robotics Res..

[21]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[22]  Abhinav Gupta,et al.  PyRobot: An Open-source Robotics Framework for Research and Benchmarking , 2019, ArXiv.

[23]  Steven M. LaValle,et al.  Rapidly-Exploring Random Trees: Progress and Prospects , 2000 .

[24]  Ruslan Salakhutdinov,et al.  Active Neural Localization , 2018, ICLR.

[25]  Matthew R. Walter,et al.  Learning Semantic Maps from Natural Language Descriptions , 2013, Robotics: Science and Systems.

[26]  Alexander Kleiner,et al.  A frontier-void-based approach for autonomous exploration in 3d , 2011 .

[27]  Stefan Lee,et al.  Neural Modular Control for Embodied Question Answering , 2018, CoRL.

[28]  Ruslan Salakhutdinov,et al.  Neural Map: Structured Memory for Deep Reinforcement Learning , 2017, ICLR.

[29]  Jitendra Malik,et al.  Combining Optimal Control and Learning for Visual Navigation in Novel Environments , 2019, CoRL.

[30]  Morgan Quigley,et al.  ROS: an open-source Robot Operating System , 2009, ICRA 2009.

[31]  Ali Farhadi,et al.  IQA: Visual Question Answering in Interactive Environments , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  J A Sethian,et al.  A fast marching level set method for monotonically advancing fronts. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[34]  Andrew W. Fitzgibbon,et al.  KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera , 2011, UIST.

[35]  Stewart W. Wilson,et al.  A Possibility for Implementing Curiosity and Boredom in Model-Building Neural Controllers , 1991 .

[36]  Juan D. Tardós,et al.  ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras , 2016, IEEE Transactions on Robotics.

[37]  Geoffrey E. Hinton,et al.  Feudal Reinforcement Learning , 1992, NIPS.

[38]  Vladlen Koltun,et al.  Beauty and the Beast: Optimal Methods Meet Learning for Drone Racing , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[39]  Vijay Kumar,et al.  End-to-End Navigation in Unknown Environments using Neural Networks , 2017, ArXiv.

[40]  Guillaume Lample,et al.  Playing FPS Games with Deep Reinforcement Learning , 2016, AAAI.

[41]  Jitendra Malik,et al.  Gibson Env: Real-World Perception for Embodied Agents , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[42]  Michael Kearns,et al.  Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.

[43]  Sven Behnke,et al.  Evaluating the Efficiency of Frontier-based Exploration Strategies , 2010, ISR/ROBOTIK.

[44]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[45]  Pieter Abbeel,et al.  Value Iteration Networks , 2016, NIPS.

[46]  Razvan Pascanu,et al.  Learning to Navigate in Complex Environments , 2016, ICLR.

[47]  Tao Chen,et al.  Learning Exploration Policies for Navigation , 2019, ICLR.

[48]  Wolfram Burgard,et al.  Information Gain-based Exploration Using Rao-Blackwellized Particle Filters , 2005, Robotics: Science and Systems.

[49]  Andrea Vedaldi,et al.  MapNet: An Allocentric Spatial Memory for Mapping Environments , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[50]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[51]  Eric P. Xing,et al.  Gated Path Planning Networks , 2018, ICML.

[52]  Ali Farhadi,et al.  Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[53]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[54]  Basilio Bona,et al.  Active SLAM and Exploration with Particle Filters Using Kullback-Leibler Divergence , 2014, J. Intell. Robotic Syst..

[55]  Stefan Kohlbrecher,et al.  A flexible and scalable SLAM system with full 3D motion estimation , 2011, 2011 IEEE International Symposium on Safety, Security, and Rescue Robotics.

[56]  Peter Auer,et al.  Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..

[57]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[58]  Jitendra Malik,et al.  On Evaluation of Embodied Navigation Agents , 2018, ArXiv.

[59]  Silvio Savarese,et al.  Scene Memory Transformer for Embodied Agents in Long-Horizon Tasks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).