3DViewGraph: Learning Global Features for 3D Shapes from A Graph of Unordered Views with Attention

Learning global features by aggregating information over multiple views has been shown to be effective for 3D shape analysis. For view aggregation in deep learning models, pooling has been applied extensively. However, pooling leads to a loss of the content within views, and the spatial relationship among views, which limits the discriminability of learned features. We propose 3DViewGraph to resolve this issue, which learns 3D global features by more effectively aggregating unordered views with attention. Specifically, unordered views taken around a shape are regarded as view nodes on a view graph. 3DViewGraph first learns a novel latent semantic mapping to project low-level view features into meaningful latent semantic embeddings in a lower dimensional space, which is spanned by latent semantic patterns. Then, the content and spatial information of each pair of view nodes are encoded by a novel spatial pattern correlation, where the correlation is computed among latent semantic patterns. Finally, all spatial pattern correlations are integrated with attention weights learned by a novel attention mechanism. This further increases the discriminability of learned features by highlighting the unordered view nodes with distinctive characteristics and depressing the ones with appearance ambiguity. We show that 3DViewGraph outperforms state-of-the-art methods under three large-scale benchmarks.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Stefan Leutenegger,et al.  Pairwise Decomposition of Image Sequences for Active Multi-view Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  IEEE conference on computer vision and pattern recognition , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[4]  Subhransu Maji,et al.  Multi-view Convolutional Neural Networks for 3D Shape Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[5]  Dong Tian,et al.  FoldingNet: Point Cloud Auto-Encoder via Deep Grid Deformation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Jiajun Wu,et al.  Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling , 2016, NIPS.

[7]  Bo Li,et al.  Large-Scale 3D Shape Retrieval from ShapeNet Core55 , 2016, 3DOR@Eurographics.

[8]  Song Bai,et al.  Triplet-Center Loss for Multi-view 3D Object Retrieval , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Jiaxin Li,et al.  SO-Net: Self-Organizing Network for Point Cloud Analysis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10]  Tomás Pajdla,et al.  NetVLAD: CNN Architecture for Weakly Supervised Place Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Karthik Ramani,et al.  Deep Learning 3D Shape Surfaces Using Geometry Images , 2016, ECCV.

[12]  Ioannis Pratikakis,et al.  Exploiting the PANORAMA Representation for Convolutional Neural Network Classification and Retrieval , 2017, 3DOR@Eurographics.

[13]  Jianxiong Xiao,et al.  3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Karthik Ramani,et al.  3D Object Classification via Spherical Projections , 2017, 2017 International Conference on 3D Vision (3DV).

[15]  Matthias Zwicker,et al.  View Inter-Prediction GAN: Unsupervised Representation Learning for 3D Shapes by Learning Global Shape Memories to Support Local View Predictions , 2018, AAAI.

[16]  Jure Leskovec,et al.  Representation Learning on Graphs: Methods and Applications , 2017, IEEE Data Eng. Bull..

[17]  Yue Gao,et al.  Multi-Modal Clique-Graph Matching for View-Based 3D Model Retrieval , 2016, IEEE Transactions on Image Processing.

[18]  Junwei Han,et al.  Deep Spatiality: Unsupervised Learning of Spatially-Enhanced Global and Local 3D Features by Deep Neural Network With Coupled Softmax , 2018, IEEE Transactions on Image Processing.

[19]  Junsong Yuan,et al.  Multi-view Harmonized Bilinear Network for 3D Object Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20]  Matthias Zwicker,et al.  Y^2Seq2Seq: Cross-Modal Representation Learning for 3D Shape and Text by Joint Reconstruction and Prediction of View and Word Sequences , 2018, AAAI.

[21]  Kaleem Siddiqi,et al.  Dominant Set Clustering and Pooling for Multi-View 3D Object Recognition , 2019, BMVC.

[22]  Yuting Su,et al.  Graph-based characteristic view set extraction and matching for 3D model retrieval , 2015, Inf. Sci..

[23]  Yasuyuki Matsushita,et al.  RotationNet: Joint Object Categorization and Pose Estimation Using Multiviews from Unsupervised Viewpoints , 2016, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Ersin Yumer,et al.  Learning local shape descriptors with view-based convolutional networks , 2017, ArXiv.

[25]  Zhichao Zhou,et al.  DeepPano: Deep Panoramic Representation for 3-D Shape Recognition , 2015, IEEE Signal Processing Letters.

[26]  Junwei Han,et al.  SeqViews2SeqLabels: Learning 3D Global Features via Aggregating Sequential Views by RNN With Attention , 2019, IEEE Transactions on Image Processing.

[27]  Qi Tian,et al.  GIFT: Towards Scalable 3D Shape Retrieval , 2017, IEEE Transactions on Multimedia.

[28]  Max Welling,et al.  Spherical CNNs , 2018, ICLR.

[29]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.