论文信息 - Cascaded Context Pyramid for Full-Resolution 3D Semantic Scene Completion

Cascaded Context Pyramid for Full-Resolution 3D Semantic Scene Completion

Semantic Scene Completion (SSC) aims to simultaneously predict the volumetric occupancy and semantic category of a 3D scene. It helps intelligent devices to understand and interact with the surrounding scenes. Due to the high-memory requirement, current methods only produce low-resolution completion predictions, and generally lose the object details. Furthermore, they also ignore the multi-scale spatial contexts, which play a vital role for the 3D inference. To address these issues, in this work we propose a novel deep learning framework, named Cascaded Context Pyramid Network (CCPNet), to jointly infer the occupancy and semantic labels of a volumetric 3D scene from a single depth image. The proposed CCPNet improves the labeling coherence with a cascaded context pyramid. Meanwhile, based on the low-level features, it progressively restores the fine-structures of objects with Guided Residual Refinement (GRR) modules. Our proposed framework has three outstanding advantages: (1) it explicitly models the 3D spatial context for performance improvement; (2) full-resolution 3D volumes are produced with structure-preserving details; (3) light-weight models with low-memory requirements are captured with a good extensibility. Extensive experiments demonstrate that in spite of taking a single-view depth map, our proposed framework can generate high-quality SSC results, and outperforms state-of-the-art approaches on both the synthetic SUNCG and real NYU datasets.

[1] Dieter Fox,et al. RGB-(D) scene labeling: Features and algorithms , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[2] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Huchuan Lu,et al. Deep gated attention networks for large-scale street-level scene segmentation , 2019, Pattern Recognit..

[4] Iasonas Kokkinos,et al. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[6] Lingfeng Wang,et al. Semantic Labeling in Very High Resolution Images via a Self-Cascaded Convolutional Neural Network , 2017, ISPRS Journal of Photogrammetry and Remote Sensing.

[7] Sanja Fidler,et al. Holistic Scene Understanding for 3D Object Detection with RGBD Cameras , 2013, 2013 IEEE International Conference on Computer Vision.

[8] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[9] Huchuan Lu,et al. Agile Amulet: Real-Time Salient Object Detection with Contextual Attention , 2018, ArXiv.

[10] Huchuan Lu,et al. Amulet: Aggregating Multi-level Convolutional Features for Salient Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[11] Yiannis Kompatsiaris,et al. Deep Learning Advances in Computer Vision with 3D Data , 2017, ACM Comput. Surv..

[12] Hongen Liao,et al. Efficient Semantic Scene Completion Network with Spatial Group Convolution , 2018, ECCV.

[13] Derek Hoiem,et al. Predicting Complete 3D Models of Indoor Scenes , 2015, ArXiv.

[14] Jianxiong Xiao,et al. Sliding Shapes for 3D Object Detection in Depth Images , 2014, ECCV.

[15] Juan Song,et al. Semantic scene completion with dense CRF from a single depth image , 2018, Neurocomputing.

[16] Ronan Collobert,et al. Learning to Refine Object Segments , 2016, ECCV.

[17] Toby P. Breckon,et al. DepthComp: Real-time Depth Image Completion Based on Prior Semantic Scene Segmentation , 2017, BMVC.

[18] Xin Tong,et al. View-Volume Network for Semantic Scene Completion from a Single Depth Image , 2018, IJCAI.

[19] Ian D. Reid,et al. RefineNet: Multi-path Refinement Networks for High-Resolution Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20] Jitendra Malik,et al. Indoor Scene Understanding with RGB-D Images: Bottom-up Segmentation, Object Detection and Semantic Segmentation , 2015, International Journal of Computer Vision.

[21] Yu Liu,et al. RGBD Based Dimensional Decomposition Residual Network for 3D Semantic Scene Completion , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Vladlen Koltun,et al. Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[23] Thomas Brox,et al. 3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation , 2016, MICCAI.

[24] Erik B. Sudderth,et al. Three-Dimensional Object Detection and Layout Prediction Using Clouds of Oriented Gradients , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25] Ulrich Neumann,et al. Depth-aware CNN for RGB-D Segmentation , 2018, ECCV.

[26] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[27] Adrian Hilton,et al. Semantic Scene Completion Combining Colour and Depth: preliminary experiments , 2018, ArXiv.

[28] Andreas Geiger,et al. Joint 3D Object and Layout Inference from a Single RGB-D Image , 2015, GCPR.

[29] Simon J. Julier,et al. Structured Prediction of Unobserved Voxels from a Single Depth Image , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Jitendra Malik,et al. Learning Rich Features from RGB-D Images for Object Detection and Segmentation , 2014, ECCV.

[31] Derek Hoiem,et al. Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[32] Katsushi Ikeuchi,et al. Beyond Point Clouds: Scene Understanding by Reasoning Geometry and Physics , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[33] Jürgen Schmidhuber,et al. Highway and Residual Networks learn Unrolled Iterative Estimation , 2016, ICLR.

[34] Jitendra Malik,et al. Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[35] Jianxiong Xiao,et al. Deep Sliding Shapes for Amodal 3D Object Detection in RGB-D Images , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36] Juergen Gall,et al. Two Stream 3D Semantic Scene Completion , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[37] Ming Yang,et al. 3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38] Yu Hu,et al. See and Think: Disentangling Semantic Scene Completion , 2018, NeurIPS.

[39] Thomas A. Funkhouser,et al. Semantic Scene Completion from a Single Depth Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40] Xiaogang Wang,et al. Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41] Jitendra Malik,et al. Hypercolumns for object segmentation and fine-grained localization , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).