论文信息 - Volumetric Instance-Aware Semantic Mapping and 3D Object Discovery

Volumetric Instance-Aware Semantic Mapping and 3D Object Discovery

To autonomously navigate and plan interactions in real-world environments, robots require the ability to robustly perceive and map complex, unstructured surrounding scenes. Besides building an internal representation of the observed scene geometry, the key insight toward a truly functional understanding of the environment is the usage of higher level entities during mapping, such as individual object instances. This work presents an approach to incrementally build volumetric object-centric maps during online scanning with a localized RGB-D camera. First, a per-frame segmentation scheme combines an unsupervised geometric approach with instance-aware semantic predictions to detect both recognized scene elements as well as previously unseen objects. Next, a data association step tracks the predicted instances across the different frames. Finally, a map integration strategy fuses information about their 3D shape, location, and, if available, semantic class into a global volume. Evaluation on a publicly available dataset shows that the proposed approach for building instance-level semantic maps is competitive with state-of-the-art methods, while additionally able to discover objects of unseen categories. The system is further evaluated within a real-world robotic mapping setup, for which qualitative results highlight the online nature of the method. Code is available at https://github.com/ethz-asl/voxblox-plusplus.

[1] Paul H. J. Kelly,et al. SLAM++: Simultaneous Localisation and Mapping at the Level of Objects , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[2] Nassir Navab,et al. When 2.5D is not enough: Simultaneous reconstruction, segmentation and recognition on dense SLAM , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[3] Lourdes Agapito,et al. MaskFusion: Real-Time Recognition, Tracking and Reconstruction of Multiple Moving Objects , 2018, 2018 IEEE International Symposium on Mixed and Augmented Reality (ISMAR).

[4] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5] Hideo Saito,et al. Efficient Object-Oriented Semantic Mapping With Object Detector , 2019, IEEE Access.

[6] Stefan Leutenegger,et al. SemanticFusion: Dense 3D semantic mapping with convolutional neural networks , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[7] Abel Gawel,et al. Incremental Object Database: Building 3D Models from Multiple Partial Observations , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[8] Gustavo Carneiro,et al. Bayesian Semantic Instance Segmentation in Open Set World , 2018, ECCV.

[9] Trevor Darrell,et al. Learning to Segment Every Thing , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10] Wolfram Burgard,et al. Hierarchies of octrees for efficient 3D mapping , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[11] Roland Siegwart,et al. Voxblox: Incremental 3D Euclidean Signed Distance Fields for on-board MAV planning , 2016, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[12] Ian D. Reid,et al. SceneCut: Joint Geometric and Object Segmentation for Indoor Scenes , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[13] Stefan Leutenegger,et al. Fusion++: Volumetric Object-Level SLAM , 2018, 2018 International Conference on 3D Vision (3DV).

[14] Federico Tombari,et al. Real-time and scalable incremental segmentation on dense SLAM , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[15] Duc Thanh Nguyen,et al. SceneNN: A Scene Meshes Dataset with aNNotations , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[16] Abel Gawel,et al. Incremental Object Database: Building 3D Models from Multiple Partial Observations , 2018, IROS 2018.

[17] Ali Farhadi,et al. YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18] Hideo Saito,et al. Fast and Accurate Semantic Mapping through Geometric-based Incremental Segmentation , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[19] Michael Milford,et al. Meaningful maps with object-oriented semantic mapping , 2016, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[20] Tim Weyrich,et al. Real-Time 3D Reconstruction in Dynamic Scenes Using Point-Based Fusion , 2013, 2013 International Conference on 3D Vision.

[21] Charles C. Kemp,et al. Challenges for robot manipulation in human environments [Grand Challenges of Robotics] , 2007, IEEE Robotics & Automation Magazine.

[22] Olaf Kähler,et al. InfiniTAM v3: A Framework for Large-Scale 3D Reconstruction with Loop Closure , 2017, ArXiv.

[23] Ross B. Girshick,et al. Mask R-CNN , 2017, 1703.06870.

[24] Roland Siegwart,et al. A synchronized visual-inertial sensor system with FPGA pre-processing for accurate real-time SLAM , 2014, ICRA 2014.

[25] Wei Liu,et al. SSD: Single Shot MultiBox Detector , 2015, ECCV.

[26] Stefan Leutenegger,et al. Efficient Octree-Based Volumetric SLAM Supporting Signed-Distance and Occupancy Mapping , 2018, IEEE Robotics and Automation Letters.

[27] Duc Thanh Nguyen,et al. Real-Time Progressive 3D Semantic Segmentation for Indoor Scenes , 2018, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[28] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[29] Roland Siegwart,et al. Maplab: An Open Framework for Research in Visual-Inertial Mapping and Localization , 2017, IEEE Robotics and Automation Letters.