Learning With Batch-Wise Optimal Transport Loss for 3D Shape Recognition

Deep metric learning is essential for visual recognition. The widely used pair-wise (or triplet) based loss objectives cannot make full use of semantical information in training samples or give enough attention to those hard samples during optimization. Thus, they often suffer from a slow convergence rate and inferior performance. In this paper, we show how to learn an importance-driven distance metric via optimal transport programming from batches of samples. It can automatically emphasize hard examples and lead to significant improvements in convergence. We propose a new batch-wise optimal transport loss and combine it in an end-to-end deep metric learning manner. We use it to learn the distance metric and deep feature representation jointly for recognition. Empirical results on visual retrieval and classification tasks with six benchmark datasets, i.e., MNIST, CIFAR10, SHREC13, SHREC14, ModelNet10, and ModelNet40, demonstrate the superiority of the proposed method. It can accelerate the convergence rate significantly while achieving a state-of-the-art recognition performance. For example, in 3D shape recognition experiments, we show that our method can achieve better recognition performance within only $5$ epochs than what can be obtained by mainstream 3D shape recognition approaches after $200$ epochs.

[1]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Yang Song,et al.  Learning Fine-Grained Image Similarity with Deep Ranking , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[5]  Ryutarou Ohbuchi,et al.  Ranking on Cross-Domain Manifold for Sketch-Based 3D Model Retrieval , 2013, 2013 International Conference on Cyberworlds.

[6]  Bo Li,et al.  Shape Retrieval of Non-Rigid 3D Human Models , 2014, 3DOR@Eurographics.

[7]  Luc Van Gool,et al.  Hough Transform and 3D SURF for Robust Three Dimensional Classification , 2010, ECCV.

[8]  C. Villani Optimal Transport: Old and New , 2008 .

[9]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[10]  Leonidas J. Guibas,et al.  Shape google: Geometric words and expressions for invariant shape retrieval , 2011, TOGS.

[11]  M. Fatih Demirci,et al.  3D object retrieval using many-to-many matching of curve skeletons , 2005, International Conference on Shape Modeling and Applications 2005 (SMI' 05).

[12]  Sebastian Scherer,et al.  VoxNet: A 3D Convolutional Neural Network for real-time object recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[13]  Hossein Mobahi,et al.  Learning with a Wasserstein Loss , 2015, NIPS.

[14]  Xiang Yu,et al.  Deep Metric Learning via Lifted Structured Feature Embedding , 2016 .

[15]  Remco C. Veltkamp,et al.  A Survey of Content Based 3D Shape Retrieval Methods , 2004, SMI.

[16]  Zhichao Zhou,et al.  DeepPano: Deep Panoramic Representation for 3-D Shape Recognition , 2015, IEEE Signal Processing Letters.

[17]  Manuel J. Fonseca,et al.  Sketch-based retrieval of drawings using spatial proximity , 2010, J. Vis. Lang. Comput..

[18]  S. S. Vallender Calculation of the Wasserstein Distance Between Probability Distributions on the Line , 1974 .

[19]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[20]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Nir Ailon,et al.  Deep Metric Learning Using Triplet Network , 2014, SIMBAD.

[22]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  T. Y. Chen,et al.  Adaptive Random Testing , 2004, ASIAN.

[24]  Richard Sinkhorn Diagonal equivalence to matrices with prescribed row and column sums. II , 1967 .

[25]  Iasonas Kokkinos,et al.  Intrinsic shape context descriptors for deformable shapes , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Masaki Aono,et al.  A large-scale Shape Benchmark for 3D object retrieval: Toyohashi shape benchmark , 2012, Proceedings of The 2012 Asia Pacific Signal and Information Processing Association Annual Summit and Conference.

[27]  Yann LeCun,et al.  Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[28]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[29]  Arnaud Doucet,et al.  Fast Computation of Wasserstein Barycenters , 2013, ICML.

[30]  David J. Fleet,et al.  Hamming Distance Metric Learning , 2012, NIPS.

[31]  Michael Werman,et al.  Fast and robust Earth Mover's Distances , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[32]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Yi Fang,et al.  Learning Barycentric Representations of 3D Shapes for Sketch-Based 3D Shape Retrieval , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  M. Eitz,et al.  Sketch-based 3 D shape retrieval , 2010 .

[35]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[36]  Subhransu Maji,et al.  Multi-view Convolutional Neural Networks for 3D Shape Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[37]  Longin Jan Latecki,et al.  GIFT: A Real-Time and Scalable 3D Shape Search Engine , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Jianxiong Xiao,et al.  3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[40]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[41]  Fang Wang,et al.  Sketch-based 3D shape retrieval using Convolutional Neural Networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Kihyuk Sohn,et al.  Improved Deep Metric Learning with Multi-class N-pair Loss Objective , 2016, NIPS.

[43]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[44]  Yi Fang,et al.  Deep Correlated Metric Learning for Sketch-based 3D Shape Retrieval , 2017, AAAI.

[45]  Lois M. L. Delcambre,et al.  Discounted Cumulated Gain Based Evaluation of Multiple-Query IR Sessions , 2008, ECIR.

[46]  Samy Bengio,et al.  Large Scale Online Learning of Image Similarity through Ranking , 2009, IbPRIA.

[47]  Bo Li,et al.  SHREC'13 Track: Large Scale Sketch-Based 3D Shape Retrieval , 2013, 3DOR@Eurographics.

[48]  Kavita Bala,et al.  Learning visual similarity for product design with convolutional neural networks , 2015, ACM Trans. Graph..

[49]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[50]  Rong Jin,et al.  Fine-grained visual categorization via multi-stage metric learning , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[52]  Bin Fang,et al.  A comparison of 3D shape retrieval methods based on a large-scale benchmark supporting multimodal queries , 2015, Comput. Vis. Image Underst..

[53]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[54]  Bernard Chazelle,et al.  Shape distributions , 2002, TOGS.

[55]  David Avis,et al.  Ground metric learning , 2011, J. Mach. Learn. Res..

[56]  Sergio Verdú,et al.  Witsenhausen's counterexample: A view from optimal transport theory , 2011, IEEE Conference on Decision and Control and European Control Conference.

[57]  Feng Zhou,et al.  Fine-Grained Categorization and Dataset Bootstrapping Using Deep Metric Learning with Humans in the Loop , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[59]  Karthik Ramani,et al.  Deep Learning 3D Shape Surfaces Using Geometry Images , 2016, ECCV.

[60]  Ioannis Pratikakis,et al.  Exploiting the PANORAMA Representation for Convolutional Neural Network Classification and Retrieval , 2017, 3DOR@Eurographics.