RGBD Based Dimensional Decomposition Residual Network for 3D Semantic Scene Completion

RGB images differentiate from depth as they carry more details about the color and texture information, which can be utilized as a vital complement to depth for boosting the performance of 3D semantic scene completion (SSC). SSC is composed of 3D shape completion (SC) and semantic scene labeling while most of the existing approaches use depth as the sole input which causes the performance bottleneck. Moreover, the state-of-the-art methods employ 3D CNNs which have cumbersome networks and tremendous parameters. We introduce a light-weight Dimensional Decomposition Residual network (DDR) for 3D dense prediction tasks. The novel factorized convolution layer is effective for reducing the network parameters, and the proposed multi-scale fusion mechanism for depth and color image can improve the completion and segmentation accuracy simultaneously. Our method demonstrates excellent performance on two public datasets. Compared with the latest method SSCNet, we achieve 5.9% gains in SC-IoU and 5.7% gains in SSC-IOU, albeit with only 21% network parameters and 16.6% FLOPs employed compared with that of SSCNet.

[1]  Seungyong Lee,et al.  RDFNet: RGB-D Multi-level Residual Feature Fusion for Indoor Semantic Segmentation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[2]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[3]  Silvio Savarese,et al.  3D Scene Understanding by Voxel-CRF , 2013, 2013 IEEE International Conference on Computer Vision.

[4]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[5]  Gernot Riegler,et al.  OctNet: Learning Deep 3D Representations at High Resolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Hongen Liao,et al.  Efficient Semantic Scene Completion Network with Spatial Group Convolution , 2018, ECCV.

[7]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Adrian Hilton,et al.  Semantic Scene Completion Combining Colour and Depth: preliminary experiments , 2018, ArXiv.

[9]  C. Qi Deep Learning on Point Sets for 3 D Classification and Segmentation , 2016 .

[10]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[11]  Yanning Zhang,et al.  Two-Stream Convolutional Networks for Blind Image Quality Assessment , 2019, IEEE Transactions on Image Processing.

[12]  Gang Wang,et al.  Learning Common and Specific Features for RGB-D Semantic Segmentation with Deconvolutional Networks , 2016, ECCV.

[13]  Alexei A. Efros,et al.  Scene completion using millions of photographs , 2008, Commun. ACM.

[14]  Sanja Fidler,et al.  Holistic Scene Understanding for 3D Object Detection with RGBD Cameras , 2013, 2013 IEEE International Conference on Computer Vision.

[15]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[16]  Jitendra Malik,et al.  Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Victor S. Lempitsky,et al.  Escape from Cells: Deep Kd-Networks for the Recognition of 3D Point Cloud Models , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[18]  Ian D. Reid,et al.  From Motion Blur to Motion Flow: A Deep Learning Solution for Removing Heterogeneous Motion Blur , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Xin Tong,et al.  View-Volume Network for Semantic Scene Completion from a Single Depth Image , 2018, IJCAI.

[20]  Simon J. Julier,et al.  Structured Prediction of Unobserved Voxels from a Single Depth Image , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Juergen Gall,et al.  Two Stream 3D Semantic Scene Completion , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[22]  Thomas A. Funkhouser,et al.  Semantic Scene Completion from a Single Depth Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Katsushi Ikeuchi,et al.  Beyond Point Clouds: Scene Understanding by Reasoning Geometry and Physics , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Andrew W. Fitzgibbon,et al.  KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera , 2011, UIST.

[25]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Yu Liu,et al.  Learning Pairwise Relationship for Multi-object Detection in Crowded Scenes , 2019, ArXiv.

[27]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[28]  Xiangyu Zhang,et al.  ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29]  Derek Hoiem,et al.  Completing 3D object shape from one depth image , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Yang Liu,et al.  O-CNN , 2017, ACM Trans. Graph..

[31]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Amaury Nègre,et al.  Probabilistic Analysis of Dynamic Scenes and Collision Risks Assessment to Improve Driving Safety , 2011, IEEE Intelligent Transportation Systems Magazine.

[34]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[35]  Andreas Geiger,et al.  Joint 3D Object and Layout Inference from a Single RGB-D Image , 2015, GCPR.

[36]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Ian D. Reid,et al.  RefineNet: Multi-path Refinement Networks for High-Resolution Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Jitendra Malik,et al.  Indoor Scene Understanding with RGB-D Images: Bottom-up Segmentation, Object Detection and Semantic Segmentation , 2015, International Journal of Computer Vision.

[39]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[40]  Eduardo Romera,et al.  ERFNet: Efficient Residual Factorized ConvNet for Real-Time Semantic Segmentation , 2018, IEEE Transactions on Intelligent Transportation Systems.

[41]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Ruigang Yang,et al.  Learning 3D shape from a single facial image via non-linear manifold embedding and alignment , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[44]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Sanja Fidler,et al.  3D Graph Neural Networks for RGBD Semantic Segmentation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[46]  Matthias Nießner,et al.  Matterport3D: Learning from RGB-D Data in Indoor Environments , 2017, 2017 International Conference on 3D Vision (3DV).