论文信息 - See and Think: Disentangling Semantic Scene Completion

See and Think: Disentangling Semantic Scene Completion

Semantic scene completion predicts volumetric occupancy and object category of a 3D scene, which helps intelligent agents to understand and interact with the surroundings. In this work, we propose a disentangled framework, sequentially carrying out 2D semantic segmentation, 2D-3D reprojection and 3D semantic scene completion. This three-stage framework has three advantages: (1) explicit semantic segmentation significantly boosts performance; (2) flexible fusion ways of sensor data bring good extensibility; (3) progress in any subtask will promote the holistic performance. Experimental results show that regardless of inputing a single depth or RGB-D, our framework can generate high-quality semantic scene completion, and outperforms state-of-the-art approaches on both synthetic and real datasets.

[1] Derek Hoiem,et al. Predicting Complete 3D Models of Indoor Scenes , 2015, ArXiv.

[2] Jianxiong Xiao,et al. A Linear Approach to Matching Cuboids in RGBD Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[3] W A Yost,et al. Blackwell Handbook of Sensation and Perception , 2008 .

[4] Stephen E. Palmer,et al. Perception of partly occluded objects: A microgenetic analysis. , 1992 .

[5] Adrian Hilton,et al. Semantic Scene Completion Combining Colour and Depth: preliminary experiments , 2018, ArXiv.

[6] Jitendra Malik,et al. Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[7] Jianxiong Xiao,et al. Deep Sliding Shapes for Amodal 3D Object Detection in RGB-D Images , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8] Yuandong Tian,et al. Single Image 3D Interpreter Network , 2016, ECCV.

[9] Dieter Fox,et al. RGB-(D) scene labeling: Features and algorithms , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[10] Xiaogang Wang,et al. Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Juergen Gall,et al. Two Stream 3D Semantic Scene Completion , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[12] Marc Pollefeys,et al. Joint 3D Scene Reconstruction and Class Segmentation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[13] Nassir Navab,et al. When 2.5D is not enough: Simultaneous reconstruction, segmentation and recognition on dense SLAM , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[14] Jitendra Malik,et al. Learning a Multi-View Stereo Machine , 2017, NIPS.

[15] Thomas A. Funkhouser,et al. Semantic Scene Completion from a Single Depth Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Ian D. Reid,et al. RefineNet: Multi-path Refinement Networks for High-Resolution Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17] Olga Sorkine-Hornung,et al. Object detection and classification from large‐scale cluttered indoor scans , 2014, Comput. Graph. Forum.

[18] Pushmeet Kohli,et al. A Contour Completion Model for Augmenting Surface Reconstructions , 2014, ECCV.

[19] Silvio Savarese,et al. 3D Scene Understanding by Voxel-CRF , 2013, 2013 IEEE International Conference on Computer Vision.

[20] Roberto Cipolla,et al. SceneNet: Understanding Real World Indoor Scenes With Synthetic Data , 2015, ArXiv.

[21] Duc Thanh Nguyen,et al. A Field Model for Repairing 3D Shapes , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Jiajun Wu,et al. MarrNet: 3D Shape Reconstruction via 2.5D Sketches , 2017, NIPS.

[23] B. Caprile,et al. Using vanishing points for camera calibration , 1990, International Journal of Computer Vision.

[24] Silvio Savarese,et al. 3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction , 2016, ECCV.

[25] Jitendra Malik,et al. Aligning 3D models to RGB-D images of cluttered scenes , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26] Vladlen Koltun,et al. Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[27] Simon J. Julier,et al. Structured Prediction of Unobserved Voxels from a Single Depth Image , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28] Derek Hoiem,et al. Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[29] Max Jaderberg,et al. Unsupervised Learning of 3D Structure from Images , 2016, NIPS.

[30] Federico Tombari,et al. CNN-SLAM: Real-Time Dense Monocular SLAM with Learned Depth Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32] Iasonas Kokkinos,et al. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33] Sebastian Scherer,et al. VoxNet: A 3D Convolutional Neural Network for real-time object recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[34] Andreas Geiger,et al. Joint 3D Object and Layout Inference from a Single RGB-D Image , 2015, GCPR.

[35] Leonidas J. Guibas,et al. Volumetric and Multi-view CNNs for Object Classification on 3D Data , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36] Abhinav Gupta,et al. Learning a Predictable and Generative Vector Representation for Objects , 2016, ECCV.

[37] Jitendra Malik,et al. Learning Rich Features from RGB-D Images for Object Detection and Segmentation , 2014, ECCV.

[38] Sanja Fidler,et al. Holistic Scene Understanding for 3D Object Detection with RGBD Cameras , 2013, 2013 IEEE International Conference on Computer Vision.

[39] Garrison W. Cottrell,et al. Understanding Convolution for Semantic Segmentation , 2017, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[40] Katsushi Ikeuchi,et al. Beyond Point Clouds: Scene Understanding by Reasoning Geometry and Physics , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[41] Jan Dirk Wegner,et al. Large-Scale Semantic 3D Reconstruction: An Adaptive Multi-resolution Model for Multi-class Volumetric Labeling , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42] Leonidas J. Guibas,et al. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43] Leonidas J. Guibas,et al. ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.