Point2SpatialCapsule: Aggregating Features and Spatial Relationships of Local Regions on Point Clouds Using Spatial-Aware Capsules

Learning discriminative shape representation directly on point clouds is still challenging in 3D shape analysis and understanding. Recent studies usually involve three steps: first splitting a point cloud into some local regions, then extracting the corresponding feature of each local region, and finally aggregating all individual local region features into a global feature as shape representation using simple max-pooling. However, such pooling-based feature aggregation methods do not adequately take the spatial relationships (e.g. the relative locations to other regions) between local regions into account, which greatly limits the ability to learn discriminative shape representation. To address this issue, we propose a novel deep learning network, named Point2SpatialCapsule, for aggregating features and spatial relationships of local regions on point clouds, which aims to learn more discriminative shape representation. Compared with the traditional max-pooling based feature aggregation networks, Point2SpatialCapsule can explicitly learn not only geometric features of local regions but also the spatial relationships among them. Point2SpatialCapsule consists of two main modules. To resolve the disorder problem of local regions, the first module, named geometric feature aggregation, is designed to aggregate the local region features into the learnable cluster centers, which explicitly encodes the spatial locations from the original 3D space. The second module, named spatial relationship aggregation, is proposed for further aggregating the clustered features and the spatial relationships among them in the feature space using the spatial-aware capsules developed in this article. Compared to the previous capsule network based methods, the feature routing on the spatial-aware capsules can learn more discriminative spatial relationships among local regions for point clouds, which establishes a direct mapping between log priors and the spatial locations through feature clusters. Experimental results demonstrate that Point2SpatialCapsule outperforms the state-of-the-art methods in the 3D shape classification, retrieval and segmentation tasks under the well-known ModelNet and ShapeNet datasets.

[1]  Naveed Akhtar,et al.  Octree Guided CNN With Spherical Kernels for 3D Point Clouds , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Shiming Xiang,et al.  Relation-Shape Convolutional Neural Network for Point Cloud Analysis , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Matthias Zwicker,et al.  View Inter-Prediction GAN: Unsupervised Representation Learning for 3D Shapes by Learning Global Shape Memories to Support Local View Predictions , 2018, AAAI.

[4]  Gim Hee Lee,et al.  PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5]  Junwei Han,et al.  Mesh Convolutional Restricted Boltzmann Machines for Unsupervised Learning of Features With Structure Preservation on 3-D Meshes , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[6]  Min Liu,et al.  Computing the Inner Distances of Volumetric Models for Articulated Shape Description with a Visibility Graph , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Lars Petersson,et al.  3DCapsule: Extending the Capsule Architecture to Classify 3D Point Clouds , 2018, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[8]  Gernot Riegler,et al.  OctNet: Learning Deep 3D Representations at High Resolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Josef Sivic,et al.  NetVLAD: CNN Architecture for Weakly Supervised Place Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Yang Liu,et al.  O-CNN , 2017, ACM Trans. Graph..

[11]  Matthias Zwicker,et al.  Parts4Feature: Learning 3D Global Features from Generally Semantic Parts in Multiple Views , 2019, IJCAI.

[12]  Lei Wang,et al.  Stacked Sparse Autoencoder Modeling Using the Synergy of Airborne LiDAR and Satellite Optical and SAR Data to Map Forest Above-Ground Biomass , 2017, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[13]  Fuxin Li,et al.  PointConv: Deep Convolutional Networks on 3D Point Clouds , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Sainan Liu,et al.  Attentional ShapeContextNet for Point Cloud Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Dong Tian,et al.  Mining Point Cloud Local Structures by Kernel Correlation and Graph Pooling , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Premkumar Natarajan,et al.  CapsuleGAN: Generative Adversarial Capsule Network , 2018, ECCV Workshops.

[17]  Yaohui Jin,et al.  MCapsNet: Capsule Network for Text with Multi-Task Learning , 2018, EMNLP.

[18]  Matthias Zwicker,et al.  Reconstructing 3D Shapes From Multiple Sketches Using Direct Shape Optimization , 2020, IEEE Transactions on Image Processing.

[19]  Qing Li,et al.  Point2Node: Correlation Learning of Dynamic-Node for Point Cloud Feature Modeling , 2019, AAAI.

[20]  Chao Wang,et al.  Robust shape normalization of 3D articulated volumetric models , 2012, Comput. Aided Des..

[21]  Yifan Xu,et al.  SpiderCNN: Deep Learning on Point Sets with Parameterized Convolutional Filters , 2018, ECCV.

[22]  Yue Gao,et al.  PVNet: A Joint Convolutional Network of Point Cloud and Multi-View for 3D Shape Recognition , 2018, ACM Multimedia.

[23]  Jonathan Sauder,et al.  Self-Supervised Deep Learning on Point Clouds by Reconstructing Space , 2019, NeurIPS.

[24]  Zhipeng Zhou,et al.  Geometry Sharing Network for 3D Point Cloud Classification and Segmentation , 2019, AAAI.

[25]  Xiang Li,et al.  Building-A-Nets: Robust Building Extraction From High-Resolution Remote Sensing Images With Adversarial Networks , 2018, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[26]  Junwei Han,et al.  SeqViews2SeqLabels: Learning 3D Global Features via Aggregating Sequential Views by RNN With Attention , 2019, IEEE Transactions on Image Processing.

[27]  Chi-Wing Fu,et al.  PointWeb: Enhancing Local Neighborhood Features for Point Cloud Processing , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Ajmal Mian,et al.  Spherical Kernel for Efficient Graph Convolution on 3D Point Clouds , 2020, IEEE transactions on pattern analysis and machine intelligence.

[29]  Meng Wang,et al.  Learned Binary Spectral Shape Descriptor for 3D Shape Correspondence , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Matthias Zwicker,et al.  DRWR: A Differentiable Renderer without Rendering for Unsupervised 3D Structure Learning from Silhouette Images , 2020, ICML.

[31]  Wei Zhang,et al.  Attention-Based Capsule Networks with Dynamic Routing for Relation Extraction , 2018, EMNLP.

[32]  Xiaogang Wang,et al.  Interpolated Convolutional Networks for 3D Point Cloud Understanding , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[33]  Senem Velipasalar,et al.  Object Classification from 3D Volumetric Data with 3D Capsule Networks , 2018, 2018 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[34]  Edward K. Wong,et al.  Deepshape: Deep learned shape descriptor for 3D shape matching and retrieval , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[37]  Ulas Bagci,et al.  Capsules for Object Segmentation , 2018, ArXiv.

[38]  Leonidas J. Guibas,et al.  Volumetric and Multi-view CNNs for Object Classification on 3D Data , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Jiaxin Li,et al.  SO-Net: Self-Organizing Network for Point Cloud Analysis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[40]  Junzhou Chen,et al.  Point Clouds Learning with Attention-based Graph Convolution Networks , 2019, Neurocomputing.

[41]  Oliver Grau,et al.  VConv-DAE: Deep Volumetric Shape Learning Without Object Labels , 2016, ECCV Workshops.

[42]  Ioannis Pratikakis,et al.  Exploiting the PANORAMA Representation for Convolutional Neural Network Classification and Retrieval , 2017, 3DOR@Eurographics.

[43]  Jun Liu,et al.  Semantic graph construction for 3D geospatial data of multi-versions , 2014 .

[44]  Zhenfeng Shao,et al.  Deep Learning Based Retrieval of Forest Aboveground Biomass from Combined LiDAR and Landsat 8 Data , 2019, Remote. Sens..

[45]  Min Yang,et al.  Investigating Capsule Networks with Dynamic Routing for Text Classification , 2018, EMNLP.

[46]  Matthias Zwicker,et al.  Point2Sequence: Learning the Shape Representation of 3D Point Clouds with an Attention-based Sequence to Sequence Network , 2018, AAAI.

[47]  Bo Li,et al.  Large-Scale 3D Shape Retrieval from ShapeNet Core55 , 2016, 3DOR@Eurographics.

[48]  Wei Wu,et al.  PointCNN: Convolution On X-Transformed Points , 2018, NeurIPS.

[49]  Shu Liu,et al.  Associatively Segmenting Instances and Semantics in Point Clouds , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Geoffrey E. Hinton,et al.  Dynamic Routing Between Capsules , 2017, NIPS.

[51]  Junwei Han,et al.  BoSCC: Bag of Spatial Context Correlations for Spatially Enhanced 3D Shape Representation , 2017, IEEE Transactions on Image Processing.

[52]  Zhichao Zhou,et al.  DeepPano: Deep Panoramic Representation for 3-D Shape Recognition , 2015, IEEE Signal Processing Letters.

[53]  Yan Zhang,et al.  SK-Net: Deep Learning on Point Cloud via End-to-end Discovery of Spatial Keypoints , 2020, AAAI.

[54]  Yi Fang,et al.  3D-A-Nets: 3D Deep Dense Descriptor for Volumetric Shapes with Adversarial Networks , 2017, ArXiv.

[55]  Chi-Man Vong,et al.  Unsupervised Learning of 3-D Local Features From Raw Voxels Based on a Novel Permutation Voxelization Strategy , 2019, IEEE Transactions on Cybernetics.

[56]  Matthias Zwicker,et al.  Multi-Angle Point Cloud-VAE: Unsupervised Feature Learning for 3D Point Clouds From Multiple Angles by Joint Self-Reconstruction and Half-to-Half Prediction , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[57]  Matthias Zwicker,et al.  ShapeCaptioner: Generative Caption Network for 3D Shapes by Learning a Mapping from Parts Detected in Multiple Views to Sentences , 2019, ACM Multimedia.

[58]  Matthias Zwicker,et al.  SeqXY2SeqZ: Structure Learning for 3D Shapes by Sequentially Predicting 1D Occupancy Segments From 2D Coordinates , 2020, ECCV.

[59]  Yue Wang,et al.  Dynamic Graph CNN for Learning on Point Clouds , 2018, ACM Trans. Graph..

[60]  Matthias Zwicker,et al.  3DViewGraph: Learning Global Features for 3D Shapes from A Graph of Unordered Views with Attention , 2019, IJCAI.

[61]  Xuelong Li,et al.  Unsupervised 3D Local Feature Learning by Circle Convolutional Restricted Boltzmann Machine , 2016, IEEE Transactions on Image Processing.

[62]  Jing Hua,et al.  A-CNN: Annularly Convolutional Neural Networks on Point Clouds , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[63]  Qi Tian,et al.  GIFT: Towards Scalable 3D Shape Retrieval , 2017, IEEE Transactions on Multimedia.

[64]  Federico Tombari,et al.  3D Point Capsule Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[65]  Jiwen Lu,et al.  DensePoint: Learning Densely Contextual Representation for Efficient Point Cloud Processing , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[66]  Dong Tian,et al.  Deep Unsupervised Learning of 3D Point Clouds via Graph Topology Inference and Filtering , 2019, IEEE Transactions on Image Processing.

[67]  Bingbing Ni,et al.  Dynamic Points Agglomeration for Hierarchical Point Sets Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[68]  Nan Yang,et al.  A Multi-View Dense Point Cloud Generation Algorithm Based on Low-Altitude Remote Sensing Images , 2016, Remote. Sens..

[69]  Bingbing Ni,et al.  Modeling Point Clouds With Self-Attention and Gumbel Subset Sampling , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[70]  Yu-Shen Liu,et al.  Point Cloud Completion by Skip-Attention Network With Hierarchical Folding , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[71]  Binh-Son Hua,et al.  ShellNet: Efficient Point Cloud Convolutional Neural Networks Using Concentric Shells Statistics , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[72]  Yu-Chiang Frank Wang,et al.  Convolution in the Cloud: Learning Deformable Kernels in 3D Graph Convolution Networks for Point Cloud Analysis , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[73]  Yi Fang,et al.  Deep Multimetric Learning for Shape-Based 3D Model Retrieval , 2017, IEEE Transactions on Multimedia.

[74]  Junwei Han,et al.  3D2SeqViews: Aggregating Sequential Views for 3D Global Feature Learning by CNN With Hierarchical Attention Aggregation , 2019, IEEE Transactions on Image Processing.

[75]  Matthias Nießner,et al.  ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[76]  Leonidas J. Guibas,et al.  KPConv: Flexible and Deformable Convolution for Point Clouds , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[77]  Matthias Nießner,et al.  3D-SIS: 3D Semantic Instance Segmentation of RGB-D Scans , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[78]  Kyoung Mu Lee,et al.  SPNet: Deep 3D Object Classification and Retrieval using Stereographic Projection , 2018, ACCV.

[79]  Junwei Han,et al.  Deep Spatiality: Unsupervised Learning of Spatially-Enhanced Global and Local 3D Features by Deep Neural Network With Coupled Softmax , 2018, IEEE Transactions on Image Processing.

[80]  Zhizhong Han,et al.  CF-SIS: Semantic-Instance Segmentation of 3D Point Clouds by Context Fusion with Self-Attention , 2020, ACM Multimedia.

[81]  Jianxiong Xiao,et al.  3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[82]  Victor S. Lempitsky,et al.  Escape from Cells: Deep Kd-Networks for the Recognition of 3D Point Cloud Models , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[83]  Matthias Zwicker,et al.  Y^2Seq2Seq: Cross-Modal Representation Learning for 3D Shape and Text by Joint Reconstruction and Prediction of View and Word Sequences , 2018, AAAI.

[84]  Daniel Rueckert,et al.  Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).