Neural Implicit Dense Semantic SLAM

Visual Simultaneous Localization and Mapping (vSLAM) is a widely used technique in robotics and computer vision that enables a robot to create a map of an unfamiliar environment using a camera sensor while simultaneously tracking its position over time. In this paper, we propose a novel RGBD vSLAM algorithm that can learn a memory-efficient, dense 3D geometry, and semantic segmentation of an indoor scene in an online manner. Our pipeline combines classical 3D vision-based tracking and loop closing with neural fields-based mapping. The mapping network learns the SDF of the scene as well as RGB, depth, and semantic maps of any novel view using only a set of keyframes. Additionally, we extend our pipeline to large scenes by using multiple local mapping networks. Extensive experiments on well-known benchmark datasets confirm that our approach provides robust tracking, mapping, and semantic labeling even with noisy, sparse, or no input depth. Overall, our proposed algorithm can greatly enhance scene perception and assist with a range of robot control problems.

[1]  Ross B. Girshick,et al.  Segment Anything , 2023, 2023 IEEE/CVF International Conference on Computer Vision (ICCV).

[2]  Martin R. Oswald,et al.  NICER-SLAM: Neural Implicit Scene Encoding for RGB SLAM , 2023, ArXiv.

[3]  M. Nießner,et al.  Panoptic Lifting for 3D Scene Understanding with Neural Fields , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Mohammad Mahdi Johari,et al.  ESLAM: Efficient Dense SLAM System Based on Hybrid Representation of Signed Distance Fields , 2022, ArXiv.

[5]  Steve A. Adeshina,et al.  Advances in Visual Simultaneous Localisation and Mapping Techniques for Autonomous Vehicles: A Review , 2022, Sensors.

[6]  J. Leonard,et al.  NeRF-SLAM: Real-Time Dense Monocular SLAM with Neural Radiance Fields , 2022, 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[7]  Yuhang Ming,et al.  Vox-Fusion: Dense Tracking and Mapping with Voxel-based Neural Implicit Representation , 2022, 2022 IEEE International Symposium on Mixed and Augmented Reality (ISMAR).

[8]  Winston H. Hsu,et al.  Orbeez-SLAM: A Real-time Monocular Visual SLAM with ORB Features and NeRF-realized Mapping , 2022, 2023 IEEE International Conference on Robotics and Automation (ICRA).

[9]  Lujia Chen,et al.  DM-NeRF: 3D Scene Geometry Decomposition and Manipulation from 2D Images , 2022, ICLR.

[10]  T. Funkhouser,et al.  Panoptic Neural Fields: A Semantic Object-Aware Neural Scene Representation , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  T. Müller,et al.  Instant neural graphics primitives with a multiresolution hash encoding , 2022, ACM Trans. Graph..

[12]  Martin R. Oswald,et al.  NICE-SLAM: Neural Implicit Scalable Encoding for SLAM , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  A. Schwing,et al.  Masked-attention Mask Transformer for Universal Image Segmentation , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Alexander G. Schwing,et al.  Mask2Former for Video Instance Segmentation , 2021, ArXiv.

[15]  Jia Deng,et al.  DROID-SLAM: Deep Visual SLAM for Monocular, Stereo, and RGB-D Cameras , 2021, NeurIPS.

[16]  Cristian Sminchisescu,et al.  A Real-Time Online Learning Framework for Joint 3D Reconstruction and Semantic Segmentation of Indoor Scenes , 2021, IEEE Robotics and Automation Letters.

[17]  C. Theobalt,et al.  NeuS: Learning Neural Implicit Surfaces by Volume Rendering for Multi-view Reconstruction , 2021, NeurIPS.

[18]  Stefan Leutenegger,et al.  In-Place Scene Labelling and Understanding with Implicit Scene Representation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[19]  Edgar Sucar,et al.  iMAP: Implicit Mapping and Positioning in Real-Time , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[20]  Shimin Hu,et al.  DI-Fusion: Online Implicit 3D Reconstruction with Deep Priors , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Carlos Campos,et al.  ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual–Inertial, and Multimap SLAM , 2020, IEEE Transactions on Robotics.

[22]  Michael Goesele,et al.  The Replica Dataset: A Digital Replica of Indoor Spaces , 2019, ArXiv.

[23]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Matthias Nießner,et al.  ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  I. Reid,et al.  Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age , 2016, IEEE Transactions on Robotics.

[26]  Jan-Michael Frahm,et al.  Structure-from-Motion Revisited , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Habibollah Haron,et al.  Review on simultaneous localization and mapping (SLAM) , 2015, 2015 IEEE International Conference on Control System, Computing and Engineering (ICCSCE).

[28]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[29]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Daniel Cremers,et al.  LSD-SLAM: Large-Scale Direct Monocular SLAM , 2014, ECCV.

[31]  Wolfram Burgard,et al.  A benchmark for the evaluation of RGB-D SLAM systems , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[32]  G. Klein,et al.  Parallel Tracking and Mapping for Small AR Workspaces , 2007, 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality.

[33]  Joachim Hertzberg,et al.  6D SLAM—3D mapping outdoor environments , 2007, J. Field Robotics.

[34]  Olivier Stasse,et al.  MonoSLAM: Real-Time Single Camera SLAM , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Hugh F. Durrant-Whyte,et al.  Simultaneous localization and mapping: part I , 2006, IEEE Robotics & Automation Magazine.

[36]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.