Emphasizing 3D Properties in Recurrent Multi-View Aggregation for 3D Shape Retrieval

Multi-view based shape descriptors have achieved impressive performance for 3D shape retrieval. The core of view-based methods is to interpret 3D structures through 2D observations. However, most existing methods pay more attention to discriminative models and none of them necessarily incorporate the 3D properties of the objects. To resolve this problem, we propose an encoder-decoder recurrent feature aggregation network (ERFA-Net) to emphasize the 3D properties of 3D shapes in multi-view features aggregation. In our network, a view sequence of the shape is trained to encode a discriminative shape embedding and estimate unseen rendered views of any viewpoints. This generation task gives an effective supervision which makes the network exploit 3D properties of shapes through various 2D images. During feature aggregation, a discriminative feature representation across multiple views is effectively exploited based on LSTM network. The proposed 3D representation has following advantages against other state-of-the-art: 1) it performs robust discrimination under the existence of noise such as view missing and occlusion, because of the improvement brought by 3D properties. 2) it has strong generative capabilities, which is useful for various 3D shape tasks. We evaluate ERFA-Net on two popular 3D shape datasets, ModelNet and ShapeNetCore55, and ERFA-Net outperforms the state-of-the-art methods significantly. Extensive experiments show the effectiveness and robustness of the proposed 3D representation.

[1]  Leonidas J. Guibas,et al.  Volumetric and Multi-view CNNs for Object Classification on 3D Data , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Stefan Leutenegger,et al.  Pairwise Decomposition of Image Sequences for Active Multi-view Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Ioannis Pratikakis,et al.  PANORAMA: A 3D Shape Descriptor Based on Panoramic Views for Unsupervised 3D Object Retrieval , 2010, International Journal of Computer Vision.

[4]  Abhinav Gupta,et al.  Learning a Predictable and Generative Vector Representation for Objects , 2016, ECCV.

[5]  Longin Jan Latecki,et al.  3D Shape Matching via Two Layer Coding , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Thomas Brox,et al.  Learning to generate chairs with convolutional neural networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[8]  Silvio Savarese,et al.  3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction , 2016, ECCV.

[9]  Edward K. Wong,et al.  Deepshape: Deep learned shape descriptor for 3D shape matching and retrieval , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Zhichao Zhou,et al.  DeepPano: Deep Panoramic Representation for 3-D Shape Recognition , 2015, IEEE Signal Processing Letters.

[12]  Bo Li,et al.  Large-Scale 3D Shape Retrieval from ShapeNet Core55 , 2016, 3DOR@Eurographics.

[13]  Song Bai,et al.  Deep learning representation using autoencoder for 3D shape retrieval , 2014, Proceedings 2014 IEEE International Conference on Security, Pattern Analysis, and Cybernetics (SPAC).

[14]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[16]  Szymon Rusinkiewicz,et al.  Rotation Invariant Spherical Harmonic Representation of 3D Shape Descriptors , 2003, Symposium on Geometry Processing.

[17]  Yue Gao,et al.  3-D Object Retrieval and Recognition With Hypergraph Analysis , 2012, IEEE Transactions on Image Processing.

[18]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[19]  Honglak Lee,et al.  Perspective Transformer Nets: Learning Single-View 3D Object Reconstruction without 3D Supervision , 2016, NIPS.

[20]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[21]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Leonidas J. Guibas,et al.  SyncSpecCNN: Synchronized Spectral CNN for 3D Shape Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[24]  Meng Wang,et al.  3D deep shape descriptor , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Yu Liu,et al.  Rethinking Feature Discrimination and Polymerization for Large-scale Recognition , 2017, ArXiv.

[26]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[27]  Ming Ouhyoung,et al.  On Visual Similarity Based 3D Model Retrieval , 2003, Comput. Graph. Forum.

[28]  Jianxiong Xiao,et al.  3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Yu Liu,et al.  3D object understanding with 3D Convolutional Neural Networks , 2016, Inf. Sci..

[30]  Joshua B. Tenenbaum,et al.  Deep Convolutional Inverse Graphics Network , 2015, NIPS.

[31]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[32]  Masaki Aono,et al.  Food Image Recognition Using Covariance of Convolutional Layer Feature Maps , 2016, IEICE Trans. Inf. Syst..

[33]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[34]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[35]  Longin Jan Latecki,et al.  GIFT: A Real-Time and Scalable 3D Shape Search Engine , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Subhransu Maji,et al.  Multi-view Convolutional Neural Networks for 3D Shape Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[37]  Mateu Sbert,et al.  3D shape retrieval using viewpoint information‐theoretic measures , 2013, Comput. Animat. Virtual Worlds.