Multi-view based neural network for semantic segmentation on 3D scenes

Dear editor, For semantic segmentation tasks, convolutional neural network (CNN) based methods have been prevalent for both 2D image semantic segmentation and 3D semantic segmentation. Though traditional methods often use local features to segment a target (For example, in [1] both 2D local features and 3D local features are used to boost recognition ability), CNN based methods [2,3] exhibit much better performance than traditional methods [4]. In all the CNN based methods on images, fully convolutional networks (FCNs) [2] are firstly proposed for end-to-end training. Basically, all the following methods are the variants of FCNs. For 3D input, some studies leverage 3D convolution to predict dense 3D semantic voxel maps [5]. However, 3D convolution has the limitation of low resolution as the GPU memory constraint. Additionally, RGB information is not well considered though it is very important. As semantic segmentation on images has been very good, we can project the semantic segmentation results of images to 3D mesh based on the geometric relationship. In this study, we mainly exploit the multi-view based neural network for semantic segmentation on 3D scenes.

[1]  Aaron Hertzmann,et al.  Learning 3D mesh segmentation and labeling , 2010, ACM Trans. Graph..

[2]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Peter V. Gehler,et al.  Efficient 2D and 3D Facade Segmentation Using Auto-Context , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Long Quan,et al.  A robust three-stage approach to large-scale urban scene recognition , 2017, Science China Information Sciences.

[5]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Luc Van Gool,et al.  Learning Where to Classify in Multi-view Semantic Segmentation , 2014, ECCV.

[7]  Subhransu Maji,et al.  3D Shape Segmentation with Projective Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Matthias Nießner,et al.  ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Seunghoon Hong,et al.  Learning Deconvolution Network for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).