Learning Descriptors With Cube Loss for View-Based 3-D Object Retrieval

3-D object retrieval has been a hot research topic in recent years. Within such a field, view-based approaches are attracting increasing attention because of the flexibility of data representation as well as the reported state-of-the-art performance. One of the most important issues related to view-based 3-D object retrieval is how to learn embedding features that are discriminative across classes while being compactly distributed within each class. In this paper, we analyze the difference between the two tasks of classification and retrieval, and propose a novel way to learn a view-pooling feature via a triplet network. In addition, we propose a new loss, named cube loss, which is able to sample a number of triplets equal to the cube of the samples in a batch. With the new loss, both hard-negative and hard-positive pairs can be effectively investigated. The experimental results on the ModelNet benchmark demonstrate that the proposed method achieves superior performance compared to state-of-the-art approaches.

[1]  Raveendran Paramesran,et al.  Image analysis by Krawtchouk moments , 2003, IEEE Trans. Image Process..

[2]  Yue Gao,et al.  GVCNN: Group-View Convolutional Neural Networks for 3D Shape Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3]  Ioannis Pratikakis,et al.  Exploiting the PANORAMA Representation for Convolutional Neural Network Classification and Retrieval , 2017, 3DOR@Eurographics.

[4]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[5]  Nir Ailon,et al.  Deep Metric Learning Using Triplet Network , 2014, SIMBAD.

[6]  Subhransu Maji,et al.  Multi-view Convolutional Neural Networks for 3D Shape Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[7]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Kaleem Siddiqi,et al.  Dominant Set Clustering and Pooling for Multi-View 3D Object Recognition , 2019, BMVC.

[9]  Yi Fang,et al.  3D-A-Nets: 3D Deep Dense Descriptor for Volumetric Shapes with Adversarial Networks , 2017, ArXiv.

[10]  Yang Song,et al.  Learning Fine-Grained Image Similarity with Deep Ranking , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Alireza Khotanzad,et al.  Invariant Image Recognition by Zernike Moments , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Lucas Beyer,et al.  In Defense of the Triplet Loss for Person Re-Identification , 2017, ArXiv.

[13]  Yue Gao,et al.  Real-Time Multimedia Social Event Detection in Microblog , 2018, IEEE Transactions on Cybernetics.

[14]  Yue Gao,et al.  Continuous Probability Distribution Prediction of Image Emotions via Multitask Shared Sparse Regression , 2017, IEEE Transactions on Multimedia.

[15]  Yue Gao,et al.  Multi-Modal Clique-Graph Matching for View-Based 3D Model Retrieval , 2016, IEEE Transactions on Image Processing.

[16]  Wei Liu,et al.  Self-Supervised Adversarial Hashing Networks for Cross-Modal Retrieval , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  Yue Gao,et al.  3-D Object Retrieval and Recognition With Hypergraph Analysis , 2012, IEEE Transactions on Image Processing.

[18]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[19]  Shih-Fu Chang,et al.  Learning Spread-Out Local Feature Descriptors , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[20]  Jianxiong Xiao,et al.  3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Federico Tombari,et al.  Unique Signatures of Histograms for Local Surface Description , 2010, ECCV.

[22]  Qi Tian,et al.  GIFT: Towards Scalable 3D Shape Retrieval , 2017, IEEE Transactions on Multimedia.

[23]  Gustavo Carneiro,et al.  Learning Local Image Descriptors with Deep Siamese and Triplet Convolutional Networks by Minimizing Global Loss Functions , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Hong Liu,et al.  Off-the-shelf CNN features for 3D object retrieval , 2017, Multimedia Tools and Applications.

[25]  Ioannis Pratikakis,et al.  Ensemble of PANORAMA-based convolutional neural networks for 3D model classification and retrieval , 2017, Comput. Graph..

[26]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Yu-Ting Su,et al.  View-Based 3-D Model Retrieval: A Benchmark , 2018, IEEE Transactions on Cybernetics.

[28]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[29]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[30]  Jorge Nocedal,et al.  On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.

[31]  Yue Gao,et al.  Predicting Personalized Image Emotion Perceptions in Social Networks , 2018, IEEE Transactions on Affective Computing.

[32]  Daniel P. Huttenlocher,et al.  Comparing Images Using the Hausdorff Distance , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[33]  Bin Fan,et al.  L2-Net: Deep Learning of Discriminative Patch Descriptor in Euclidean Space , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Yue Gao,et al.  View-Based 3D Object Retrieval: Challenges and Approaches , 2014, IEEE MultiMedia.

[35]  Ioannis Pratikakis,et al.  3D object retrieval via range image queries in a bag-of-visual-words context , 2013, The Visual Computer.

[36]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[37]  Rongrong Ji,et al.  3D object retrieval with multi-feature collaboration and bipartite graph matching , 2016, Neurocomputing.

[38]  Qi Tian,et al.  Less is More: Efficient 3-D Object Retrieval With Query View Selection , 2011, IEEE Transactions on Multimedia.

[39]  Xinbo Gao,et al.  Triplet-Based Deep Hashing Network for Cross-Modal Retrieval , 2018, IEEE Transactions on Image Processing.

[40]  Yue Gao,et al.  Inductive Multi-Hypergraph Learning and Its Application on View-Based 3D Object Classification , 2018, IEEE Transactions on Image Processing.

[41]  Vincent Lepetit,et al.  Learning descriptors for object recognition and 3D pose estimation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Jiri Matas,et al.  Working hard to know your neighbor's margins: Local descriptor learning loss , 2017, NIPS.