PointNet-Based Channel Attention VLAD Network

With the upgrading of application scenarios, computer vision is progressively expanded to 3D. Many methods that process point cloud directly provide a new paradigm for 3D understanding. Most of these methods employ maxpooling to handle the sparsity and disorder of point cloud. However, maxpooling layer extracts the global feature of the entire point cloud without learnable parameters, which is heuristics and insufficient. In this paper, we propose a VLAD enhanced Feature Aggregate Module to aggregate local features adaptively. In addition, a Channel Attention Module is applied to the features to reassemble the elements in high-dimension feature space. The experiments in both classification and segmentation demonstrate that the proposed method can improve the capacity of the baseline to extract more informative features. Specifically, we improve the accuracy from 88.5% to 89.8% for classification in ModelNet40 and improve the accuracy from 78.94% to 82.07% for semantic segmentation in S3DIS.

[1]  Xiaowu Chen,et al.  3D Mesh Labeling via Deep Convolutional Neural Networks , 2015, ACM Trans. Graph..

[2]  Walid Saad,et al.  A Tutorial on UAVs for Wireless Networks: Applications, Challenges, and Open Problems , 2018, IEEE Communications Surveys & Tutorials.

[3]  Fabio A. González,et al.  OCT-NET: A convolutional network for automatic classification of normal and diabetic macular edema using sd-oct volumes , 2018, 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018).

[4]  Subhransu Maji,et al.  Multi-view Convolutional Neural Networks for 3D Shape Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[5]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[6]  Xinyu Zhang,et al.  A study on key technologies of unmanned driving , 2016, CAAI Trans. Intell. Technol..

[7]  Leonidas J. Guibas,et al.  FPNN: Field Probing Neural Networks for 3D Data , 2016, NIPS.

[8]  Meng Wang,et al.  3D deep shape descriptor , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Leonidas J. Guibas,et al.  3D-Assisted Feature Synthesis for Novel Views of an Object , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[10]  Sebastian Scherer,et al.  VoxNet: A 3D Convolutional Neural Network for real-time object recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[11]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[12]  Tomás Pajdla,et al.  NetVLAD: CNN Architecture for Weakly Supervised Place Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Leonidas J. Guibas,et al.  Volumetric and Multi-view CNNs for Object Classification on 3D Data , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Cordelia Schmid,et al.  Aggregating Local Image Descriptors into Compact Codes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Ingmar Posner,et al.  Voting for Voting in Online Point Cloud Object Detection , 2015, Robotics: Science and Systems.

[16]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[18]  Silvio Savarese,et al.  3D Semantic Parsing of Large-Scale Indoor Spaces , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Jianxiong Xiao,et al.  3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Victor S. Lempitsky,et al.  Escape from Cells: Deep Kd-Networks for the Recognition of 3D Point Cloud Models , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[21]  Yang Liu,et al.  O-CNN , 2017, ACM Trans. Graph..