Multi Voxel-Point Neurons Convolution (MVPConv) for Fast and Accurate 3D Deep Learning

We present a new convolutional neural network, called Multi Voxel-Point Neurons Convolution (MVPConv), for fast and accurate 3D deep learning. The previous works adopt either individual point-based features or local-neighboring voxel-based features to process 3D model, which limits the performance of models due to the inefficient computation. Moreover, most of the existing 3D deep learning frameworks aim at solving one specific task, and only a few of them can handle a variety of tasks. Integrating both the advantages of the voxel and point-based methods, the proposed MVPConv can effectively increase the neighboring collection between point-based features and also promote the independence among voxel-based features. Simply replacing the corresponding convolution module with MVPConv, we show that MVPConv can fit in different backbones to solve a wide range of 3D tasks. Extensive experiments on benchmark datasets such as ShapeNet Part, S3DIS and KITTI for various tasks show that MVPConv improves the accuracy of the backbone (PointNet) by up to 36%, and achieves higher accuracy than the voxel-based model with up to 34× speedup. In addition, MVPConv also outperforms the state-of-the-art point-based models with up to 8× speedup. Notably, our MVPConv achieves better accuracy than the newest point-voxel-based model PVCNN (a model more efficient than PointNet) with lower latency.

[1]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[2]  Jianxiong Xiao,et al.  3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Silvio Savarese,et al.  Joint 2D-3D-Semantic Data for Indoor Scene Understanding , 2017, ArXiv.

[4]  Yaron Lipman,et al.  Point convolutional neural networks by extension operators , 2018, ACM Trans. Graph..

[5]  Vladlen Koltun,et al.  Tangent Convolutions for Dense Prediction in 3D , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Y. Wang,et al.  Learning of 3D Graph Convolution Networks for Point Cloud Analysis , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Zhizhong Han,et al.  CF-SIS: Semantic-Instance Segmentation of 3D Point Clouds by Context Fusion with Self-Attention , 2020, ACM Multimedia.

[8]  Victor S. Lempitsky,et al.  Escape from Cells: Deep Kd-Networks for the Recognition of 3D Point Cloud Models , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[9]  Leonidas J. Guibas,et al.  Volumetric and Multi-view CNNs for Object Classification on 3D Data , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Yifan Xu,et al.  SpiderCNN: Deep Learning on Point Sets with Parameterized Convolutional Filters , 2018, ECCV.

[11]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[12]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[13]  Laurens van der Maaten,et al.  3D Semantic Segmentation with Submanifold Sparse Convolutional Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14]  Song Han,et al.  Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution , 2020, ECCV.

[15]  Subhransu Maji,et al.  SPLATNet: Sparse Lattice Networks for Point Cloud Processing , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[17]  Silvio Savarese,et al.  3D Semantic Parsing of Large-Scale Indoor Spaces , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Bingbing Ni,et al.  Self-Prediction for Joint Instance and Semantic Segmentation of Point Clouds , 2020, ECCV.

[19]  Thomas Brox,et al.  Octree Generating Networks: Efficient Convolutional Architectures for High-resolution 3D Outputs , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[20]  Ye Duan,et al.  PointGrid: A Deep Network for 3D Shape Understanding , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]  Lin Zhao,et al.  JSNet: Joint Instance and Semantic Segmentation of 3D Point Clouds , 2019, AAAI.

[22]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[24]  Yin Zhou,et al.  VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25]  Shiming Xiang,et al.  Relation-Shape Convolutional Neural Network for Point Cloud Analysis , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Silvio Savarese,et al.  SEGCloud: Semantic Segmentation of 3D Point Clouds , 2017, 2017 International Conference on 3D Vision (3DV).

[28]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[29]  Sebastian Scherer,et al.  VoxNet: A 3D Convolutional Neural Network for real-time object recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[30]  Gleb Gusev,et al.  Pvdeconv: Point-Voxel Deconvolution for Autoencoding CAD Construction in 3D , 2020, 2020 IEEE International Conference on Image Processing (ICIP).

[31]  Kaleem Siddiqi,et al.  Local Spectral Graph Convolution for Point Set Feature Learning , 2018, ECCV.

[32]  Ulrich Neumann,et al.  Grid-GCN for Fast and Scalable Point Cloud Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Gernot Riegler,et al.  OctNet: Learning Deep 3D Representations at High Resolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Martin Simonovsky,et al.  Large-Scale Point Cloud Semantic Segmentation with Superpoint Graphs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[36]  Ulrich Neumann,et al.  Recurrent Slice Networks for 3D Segmentation of Point Clouds , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[37]  Lin Gao,et al.  VV-Net: Voxel VAE Net With Group Convolutions for Point Cloud Segmentation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[38]  Lei Wang,et al.  Appendix for : Graph Attention Convolution for Point Cloud Semantic Segmentation , 2019 .

[39]  Larry S. Davis,et al.  Modeling Local Geometric Structure of 3D Point Clouds Using Geo-CNN , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[41]  Thomas Brox,et al.  3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation , 2016, MICCAI.

[42]  Song Han,et al.  Point-Voxel CNN for Efficient 3D Deep Learning , 2019, NeurIPS.

[43]  Feng Lu,et al.  VoxSegNet: Volumetric CNNs for Semantic Part Segmentation of 3D Shapes , 2018, IEEE Transactions on Visualization and Computer Graphics.

[44]  Sainan Liu,et al.  Attentional ShapeContextNet for Point Cloud Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[45]  Ligang Liu,et al.  VIPNet: A Fast and Accurate Single-View Volumetric Reconstruction by Learning Sparse Implicit Point Guidance , 2020, 2020 International Conference on 3D Vision (3DV).

[46]  Dong Tian,et al.  Mining Point Cloud Local Structures by Kernel Correlation and Graph Pooling , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[47]  Leonidas J. Guibas,et al.  Frustum PointNets for 3D Object Detection from RGB-D Data , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[48]  Zhipeng Zhou,et al.  Geometry Sharing Network for 3D Point Cloud Classification and Segmentation , 2019, AAAI.

[49]  Wei Wu,et al.  PointCNN: Convolution On X-Transformed Points , 2018, NeurIPS.

[50]  Yue Wang,et al.  Dynamic Graph CNN for Learning on Point Clouds , 2018, ACM Trans. Graph..

[51]  Jiaxin Li,et al.  SO-Net: Self-Organizing Network for Point Cloud Analysis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[52]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[53]  Feihu Zhang,et al.  Deep FusionNet for Point Cloud Semantic Segmentation , 2020, ECCV.