Learning Attentive and Hierarchical Representations for 3D Shape Recognition

This paper proposes a novel method for 3D shape representation learning, namely Hyperbolic Embedded Attentive Representation (HEAR). Different from existing multi-view based methods, HEAR develops a unified framework to address both multi-view redundancy and single-view incompleteness. Specifically, HEAR firstly employs a hybrid attention (HA) module, which consists of a view-agnostic attention (VAA) block and a view-specific attention (VSA) block. These two blocks jointly explore distinct but complementary spatial saliency of local features for each single-view image. Subsequently, a multi-granular view pooling (MVP) module is introduced to aggregate the multi-view features with different granularities in a coarse-to-fine manner. The resulting feature set implicitly has hierarchical relations, which are therefore projected into a Hyperbolic space by adopting the Hyperbolic embedding. A hierarchical representation is learned by Hyperbolic multi-class logistic regression based on the Hyperbolic geometry. Experimental results clearly show that HEAR outperforms the state-of-the-art approaches on three 3D shape recognition tasks including generic 3D shape retrieval, 3D shape classification and sketch-based 3D shape retrieval.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Subhransu Maji,et al.  Multi-view Convolutional Neural Networks for 3D Shape Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[3]  Yi Fang,et al.  Siamese CNN-BiLSTM Architecture for 3D Shape Representation Learning , 2018, IJCAI.

[4]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5]  Valentin Khrulkov,et al.  Hyperbolic Image Embeddings , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Leonidas J. Guibas,et al.  Volumetric and Multi-view CNNs for Object Classification on 3D Data , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Song Bai,et al.  Triplet-Center Loss for Multi-view 3D Object Retrieval , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8]  Manuel J. Fonseca,et al.  Sketch-based retrieval of drawings using spatial proximity , 2010, J. Vis. Lang. Comput..

[9]  Cordelia Schmid,et al.  Moulding Humans: Non-Parametric 3D Human Shape Estimation From Single Images , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[10]  Masaki Aono,et al.  A large-scale Shape Benchmark for 3D object retrieval: Toyohashi shape benchmark , 2012, Proceedings of The 2012 Asia Pacific Signal and Information Processing Association Annual Summit and Conference.

[11]  Szymon Rusinkiewicz,et al.  Rotation Invariant Spherical Harmonic Representation of 3D Shape Descriptors , 2003, Symposium on Geometry Processing.

[12]  Yi Fang,et al.  Deep Correlated Metric Learning for Sketch-based 3D Shape Retrieval , 2017, AAAI.

[13]  Stefan Leutenegger,et al.  Pairwise Decomposition of Image Sequences for Active Multi-view Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Yasuyuki Matsushita,et al.  RotationNet: Joint Object Categorization and Pose Estimation Using Multiviews from Unsupervised Viewpoints , 2016, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Zahraa Yasseen,et al.  View selection for sketch-based 3D model retrieval using visual part shape description , 2016, The Visual Computer.

[16]  Hamid Laga,et al.  Learning shape retrieval from different modalities , 2017, Neurocomputing.

[17]  Rik Sarkar,et al.  Low Distortion Delaunay Embedding of Trees in Hyperbolic Plane , 2011, GD.

[18]  Jiaxin Li,et al.  SO-Net: Self-Organizing Network for Point Cloud Analysis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Junwei Han,et al.  3D2SeqViews: Aggregating Sequential Views for 3D Global Feature Learning by CNN With Hierarchical Attention Aggregation , 2019, IEEE Transactions on Image Processing.

[20]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[21]  Ryutarou Ohbuchi,et al.  Ranking on Cross-Domain Manifold for Sketch-Based 3D Model Retrieval , 2013, 2013 International Conference on Cyberworlds.

[22]  Zhichao Zhou,et al.  DeepPano: Deep Panoramic Representation for 3-D Shape Recognition , 2015, IEEE Signal Processing Letters.

[23]  Qi Tian,et al.  Ensemble Diffusion for Retrieval , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[24]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Yue Gao,et al.  MeshNet: Mesh Neural Network for 3D Shape Representation , 2018, AAAI.

[26]  Christopher De Sa,et al.  Representation Tradeoffs for Hyperbolic Embeddings , 2018, ICML.

[27]  Yue Gao,et al.  GVCNN: Group-View Convolutional Neural Networks for 3D Shape Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Bo Li,et al.  SHREC'13 Track: Large Scale Sketch-Based 3D Shape Retrieval , 2013, 3DOR@Eurographics.

[29]  Yue Gao,et al.  Hypergraph Neural Networks , 2018, AAAI.

[30]  Jiwen Lu,et al.  DensePoint: Learning Densely Contextual Representation for Efficient Point Cloud Processing , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[31]  Leonidas J. Guibas,et al.  KPConv: Flexible and Deformable Convolution for Point Clouds , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[32]  Xiang Bai,et al.  View N-Gram Network for 3D Object Retrieval , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[33]  Di Zhao,et al.  Preconditioning Toeplitz-plus-diagonal linear systems using the Sherman-Morrison-Woodbury formula , 2017, J. Comput. Appl. Math..

[34]  Yi Fang,et al.  Deep Correlated Holistic Metric Learning for Sketch-Based 3D Shape Retrieval , 2018, IEEE Transactions on Image Processing.

[35]  Fang Wang,et al.  Sketch-based 3D shape retrieval using Convolutional Neural Networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Han Sun,et al.  Learning With Batch-Wise Optimal Transport Loss for 3D Shape Recognition , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Ming Ouhyoung,et al.  On Visual Similarity Based 3D Model Retrieval , 2003, Comput. Graph. Forum.

[38]  Tao Xiang,et al.  Semantic Embedding for Sketch-Based 3D Shape Retrieval , 2018, BMVC.

[39]  Jingfei Jiang,et al.  Enhancing 2D Representation via Adjacent Views for 3D Shape Retrieval , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[40]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[41]  Liwei Wang,et al.  Learning Relationships for Multi-View 3D Object Recognition , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[42]  Xiaogang Wang,et al.  Interpolated Convolutional Networks for 3D Point Cloud Understanding , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[43]  Shanmuganathan Raman,et al.  LP-3DCNN: Unveiling Local Phase in 3D Convolutional Neural Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Fumin Shen,et al.  Deep Sketch-Shape Hashing With Segmented 3D Stochastic Viewing , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[47]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[48]  Longin Jan Latecki,et al.  GIFT: A Real-Time and Scalable 3D Shape Search Engine , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Junwei Han,et al.  SeqViews2SeqLabels: Learning 3D Global Features via Aggregating Sequential Views by RNN With Attention , 2019, IEEE Transactions on Image Processing.

[50]  Hongyi Li,et al.  Improved block preconditioners for linear systems arising from half-quadratic image restoration , 2019, Appl. Math. Comput..

[51]  Kai Xu,et al.  Learning Discriminative 3D Shape Representations by View Discerning Networks , 2018, IEEE Transactions on Visualization and Computer Graphics.

[52]  Jianxiong Xiao,et al.  3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Victor S. Lempitsky,et al.  Escape from Cells: Deep Kd-Networks for the Recognition of 3D Point Cloud Models , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[54]  Junsong Yuan,et al.  Multi-view Harmonized Bilinear Network for 3D Object Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[55]  Bo Li,et al.  A comparison of methods for sketch-based 3D shape retrieval , 2014, Comput. Vis. Image Underst..

[56]  Michael J. Black,et al.  Learning to Reconstruct 3D Human Pose and Shape via Model-Fitting in the Loop , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[57]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Gary Bécigneul,et al.  Riemannian Adaptive Optimization Methods , 2018, ICLR.

[59]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[60]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[61]  Razvan Pascanu,et al.  Hyperbolic Attention Networks , 2018, ICLR.

[62]  Thomas Hofmann,et al.  Hyperbolic Neural Networks , 2018, NeurIPS.

[63]  Subhransu Maji,et al.  A Deeper Look at 3D Shape Classifiers , 2018, ECCV Workshops.

[64]  Jure Leskovec,et al.  Hyperbolic Graph Convolutional Neural Networks , 2019, NeurIPS.

[65]  Neil A. Dodgson,et al.  Shape2Vec: semantic-based descriptors for 3D shapes, sketches and images , 2016, ACM Trans. Graph..

[66]  Kaleem Siddiqi,et al.  Dominant Set Clustering and Pooling for Multi-View 3D Object Recognition , 2019, BMVC.

[67]  Yi Fang,et al.  Learning Barycentric Representations of 3D Shapes for Sketch-Based 3D Shape Retrieval , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[68]  Thomas A. Funkhouser,et al.  The Princeton Shape Benchmark , 2004, Proceedings Shape Modeling Applications, 2004..

[69]  Bui Tuong Phong Illumination for computer generated pictures , 1975, Commun. ACM.

[70]  Theodore Lim,et al.  Generative and Discriminative Voxel Modeling with Convolutional Neural Networks , 2016, ArXiv.

[71]  Ryutarou Ohbuchi,et al.  Deep Aggregation of Local 3D Geometric Features for 3D Model Retrieval , 2016, BMVC.

[72]  Yi Fang,et al.  Deep Cross-modality Adaptation via Semantics Preserving Adversarial Learning for Sketch-based 3D Shape Retrieval , 2018, ECCV.