Coarse-to-fine segmentation for indoor scenes with progressive supervision

Abstract Three-dimensional indoor scene segmentation is highly difficult due to the natural hierarchical structures and complicated contextual relationships in the scenes. In this paper, a 3D scene segmentation method that uses a stacked network is proposed for utilizing the context and hierarchy in 3D scenes. The method consists of two parts: a stacked network and progressive supervision. The stacked network consists of multiple base segmentation networks, and each network's output is concatenated to the raw input as another network's input to provide a prior context. Progressive supervision includes a group of coarse-to-fine segmentation labels that are generated based on the spatial relationships among objects in the scene, and it forces the network to learn the hierarchy. The experimental results from a regular dataset and a complex dataset demonstrate that our progressive supervision is effective and that our method outperforms existing methods in complex scenes.

[1]  David Marr,et al.  VISION A Computational Investigation into the Human Representation and Processing of Visual Information , 2009 .

[2]  Matthias Nießner,et al.  ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Jiajun Wu,et al.  MarrNet: 3D Shape Reconstruction via 2.5D Sketches , 2017, NIPS.

[4]  Zhen Wang,et al.  A Multilevel Point-Cluster-Based Discriminative Feature for ALS Point Cloud Classification , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[5]  Michel F. Valstar,et al.  A CNN Cascade for Landmark Guided Semantic Part Segmentation , 2016, ECCV Workshops.

[6]  Duc Thanh Nguyen,et al.  SceneNN: A Scene Meshes Dataset with aNNotations , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[7]  Silvio Savarese,et al.  SEGCloud: Semantic Segmentation of 3D Point Clouds , 2017, 2017 International Conference on 3D Vision (3DV).

[8]  Luc Van Gool,et al.  Weakly Supervised Cascaded Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Andrew E. Johnson,et al.  Using Spin Images for Efficient Object Recognition in Cluttered 3D Scenes , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Silvio Savarese,et al.  3D Semantic Parsing of Large-Scale Indoor Spaces , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Luc Van Gool,et al.  3D all the way: Semantic segmentation of urban scenes from start to end in 3D , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[13]  Matthias Nießner,et al.  ScanComplete: Large-Scale Scene Completion and Semantic Segmentation for 3D Scans , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Bisheng Yang,et al.  Computing multiple aggregation levels and contextual features for road facilities recognition using mobile laser scanning data , 2017 .

[16]  Sanja Fidler,et al.  3D Graph Neural Networks for RGBD Semantic Segmentation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[17]  Matthias Nießner,et al.  Matterport3D: Learning from RGB-D Data in Indoor Environments , 2017, 2017 International Conference on 3D Vision (3DV).

[18]  Cewu Lu,et al.  PointSIFT: A SIFT-like Network Module for 3D Point Cloud Semantic Segmentation , 2018, ArXiv.

[19]  Zhen Wang,et al.  A Multiscale and Hierarchical Feature Extraction Method for Terrestrial Laser Scanning Point Cloud Classification , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[20]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.