Simultaneous Mapping and Target Driven Navigation

This work presents a modular architecture for simultaneous mapping and target driven navigation in indoors environments. The semantic and appearance stored in 2.5D map is distilled from RGB images, semantic segmentation and outputs of object detectors by convolutional neural networks. Given this representation, the mapping module learns to localize the agent and register consecutive observations in the map. The navigation task is then formulated as a problem of learning a policy for reaching semantic targets using current observations and the up-to-date map. We demonstrate that the use of semantic information improves localization accuracy and the ability of storing spatial semantic map aids the target driven navigation policy. The two modules are evaluated separately and jointly on Active Vision Dataset and Matterport3D environments, demonstrating improved performance on both localization and navigation tasks.

[1]  Silvio Savarese,et al.  Scene Memory Transformer for Embodied Agents in Long-Horizon Tasks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Devendra Singh Chaplot,et al.  Modular Visual Navigation using Active Neural Mapping , 2019 .

[3]  Stefan Lee,et al.  Embodied Question Answering in Photorealistic Environments With Point Cloud Perception , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Rahul Sukthankar,et al.  Cognitive Mapping and Planning for Visual Navigation , 2017, International Journal of Computer Vision.

[5]  Jana Kosecka,et al.  Visual Representations for Semantic Target Driven Navigation , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[6]  Vijay Kumar,et al.  Memory Augmented Control Networks , 2017, ICLR.

[7]  Jitendra Malik,et al.  Habitat: A Platform for Embodied AI Research , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[8]  Thomas A. Funkhouser,et al.  Semantic Scene Completion from a Single Depth Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Ruslan Salakhutdinov,et al.  Neural Map: Structured Memory for Deep Reinforcement Learning , 2017, ICLR.

[10]  Yuandong Tian,et al.  Building Generalizable Agents with a Realistic and Rich 3D Environment , 2018, ICLR.

[11]  Jana Kosecka,et al.  Joint Semantic Segmentation and Depth Estimation with Deep Convolutional Networks , 2016 .

[12]  Jana Kosecka,et al.  A dataset for developing and benchmarking active vision , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[13]  Ali Farhadi,et al.  IQA: Visual Question Answering in Interactive Environments , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[15]  John J. Leonard,et al.  Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age , 2016, IEEE Transactions on Robotics.

[16]  Matthias Nießner,et al.  Matterport3D: Learning from RGB-D Data in Indoor Environments , 2017, 2017 International Conference on 3D Vision (3DV).

[17]  Xin Ye,et al.  Active Object Perceiver: Recognition-Guided Policy Learning for Object Searching on Mobile Robots , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[18]  Razvan Pascanu,et al.  Learning to Navigate in Complex Environments , 2016, ICLR.

[19]  Tao Chen,et al.  Learning Exploration Policies for Navigation , 2019, ICLR.

[20]  J. M. M. Montiel,et al.  ORB-SLAM: A Versatile and Accurate Monocular SLAM System , 2015, IEEE Transactions on Robotics.

[21]  Xin Ye,et al.  GAPLE: Generalizable Approaching Policy LEarning for Robotic Object Searching in Indoor Environment , 2018, IEEE Robotics and Automation Letters.

[22]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[23]  Fereshteh Sadeghi,et al.  DIViS: Domain Invariant Visual Servoing for Collision-Free Goal Reaching , 2019, Robotics: Science and Systems.

[24]  Andrea Vedaldi,et al.  MapNet: An Allocentric Spatial Memory for Mapping Environments , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[26]  Stefan Lee,et al.  Embodied Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[27]  Wolfram Burgard,et al.  Neural SLAM: Learning to Explore with External Memory , 2017, 1706.09520.

[28]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Ali Farhadi,et al.  Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).