论文信息 - DISCOMAN: Dataset of Indoor SCenes for Odometry, Mapping And Navigation

DISCOMAN: Dataset of Indoor SCenes for Odometry, Mapping And Navigation

We present a novel dataset for training and benchmarking semantic SLAM methods. The dataset consists of 200 long sequences, each one containing 3000-5000 data frames. We generate the sequences using realistic home layouts. For that we sample trajectories that simulate motions of a simple home robot, and then render the frames along the trajectories. Each data frame contains a) RGB images generated using physically-based rendering, b) simulated depth measurements, c) simulated IMU readings and d) ground truth occupancy grid of a house. Our dataset serves a wider range of purposes compared to existing datasets and is the first large-scale benchmark focused on the mapping component of SLAM. The dataset is split into train/validation/test parts sampled from different sets of virtual houses. We present benchmarking results for both classical geometry-based [1], [2] and recent learning-based [3] SLAM algorithms, a baseline mapping method [4], semantic segmentation [5] and panoptic segmentation [6]. The dataset and source code for reproducing our experiments will be publicly available at the time of publication.

[1] Silvio Savarese,et al. Joint 2D-3D-Semantic Data for Indoor Scene Understanding , 2017, ArXiv.

[2] Jan Kautz,et al. PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3] Daniel Maier,et al. Real-time navigation in 3D environments based on depth camera data , 2012, 2012 12th IEEE-RAS International Conference on Humanoid Robots (Humanoids 2012).

[4] Chenyang Lu,et al. Monocular Semantic Occupancy Grid Mapping with Convolutional Variational Auto-Encoders , 2018, ArXiv.

[5] Qiao Wang,et al. VirtualWorlds as Proxy for Multi-object Tracking Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Daniel Cremers,et al. FuseNet: Incorporating Depth into Semantic Segmentation via Fusion-Based CNN Architecture , 2016, ACCV.

[7] Carsten Rother,et al. Panoptic Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8] T. Collins,et al. Occupancy grid mapping: An empirical evaluation , 2007, 2007 Mediterranean Conference on Control & Automation.

[9] Luigi di Stefano,et al. SkiMap: An efficient mapping framework for robot navigation , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[10] Berthold K. P. Horn,et al. Closed-form solution of absolute orientation using unit quaternions , 1987 .

[11] Matthias Nießner,et al. Matterport3D: Learning from RGB-D Data in Indoor Environments , 2017, 2017 International Conference on 3D Vision (3DV).

[12] Ersin Yumer,et al. Physically-Based Rendering for Indoor Scene Understanding Using Convolutional Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Konstantin Sofiiuk,et al. AdaptIS: Adaptive Instance Selection Network , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[14] Wolfram Burgard,et al. OctoMap: an efficient probabilistic 3D mapping framework based on octrees , 2013, Autonomous Robots.

[15] Vladlen Koltun,et al. Open3D: A Modern Library for 3D Data Processing , 2018, ArXiv.

[16] Jianxiong Xiao,et al. SUN RGB-D: A RGB-D scene understanding benchmark suite , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17] Christian Wolf,et al. Semantic Grid Estimation with a Hybrid Bayesian and Deep Neural Network Approach , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[18] Matthias Nießner,et al. ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19] George Papandreou,et al. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[20] Marc Pollefeys,et al. Slanted Stixels: Representing San Francisco's Steepest Streets , 2017, BMVC.

[21] Stefan Leutenegger,et al. SceneNet RGB-D: Can 5M Synthetic Images Beat Generic ImageNet Pre-training on Indoor Segmentation? , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[22] Daniel Cremers,et al. Direct Sparse Odometry , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23] Gabriele Costante,et al. LS-VO: Learning Dense Optical Subspace for Robust Visual Odometry Estimation , 2017, IEEE Robotics and Automation Letters.

[24] Wolfram Burgard,et al. A benchmark for the evaluation of RGB-D SLAM systems , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[25] Juan D. Tardós,et al. ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras , 2016, IEEE Transactions on Robotics.

[26] Qi Zhao,et al. Egocentric Spatial Memory , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[27] Wenbin Li,et al. InteriorNet: Mega-scale Multi-sensor Photo-realistic Indoor Scenes Dataset , 2018, BMVC.

[28] Jörg Stückler,et al. The TUM VI Benchmark for Evaluating Visual-Inertial Odometry , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[29] Roland Siegwart,et al. The EuRoC micro aerial vehicle datasets , 2016, Int. J. Robotics Res..

[30] Andreas Geiger,et al. Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[31] Andrew J. Davison,et al. A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[32] Anton Konushin,et al. Scene Motion Decomposition for Learnable Visual Odometry , 2019, ArXiv.

[33] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.

[34] Derek Hoiem,et al. Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.