An approach to supervised distance metric learning based on difference of convex functions programming

Abstract Distance metric learning has motivated a great deal of research over the last years due to its robustness for many pattern recognition problems. In this paper, we develop a supervised distance metric learning method that aims to improve the performance of nearest-neighbor classification. Our method is inspired by the large-margin principle, resulting in an objective function based on a sum of margin violations to be minimized. Due to the use of the ramp loss function, the corresponding objective function is nonconvex, making it more challenging. To overcome this limitation, we formulate our distance metric learning problem as an instance of difference of convex functions (DC) programming. This allows us to design a more robust method than when using standard optimization techniques. The effectiveness of this method is empirically demonstrated through extensive experiments on several standard benchmark data sets.

[1]  Feiping Nie,et al.  Robust Distance Metric Learning via Simultaneous L1-Norm Minimization and Maximization , 2014, ICML.

[2]  Sameer A. Nene,et al.  Columbia Object Image Library (COIL100) , 1996 .

[3]  Shenghuo Zhu,et al.  Deep Learning of Invariant Features via Simulated Fixations in Video , 2012, NIPS.

[4]  Matti Pietikäinen,et al.  Face Description with Local Binary Patterns: Application to Face Recognition , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Tommy W. S. Chow,et al.  Constrained large Margin Local Projection algorithms and extensions for multimodal dimensionality reduction , 2012, Pattern Recognit..

[6]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[7]  Dacheng Tao,et al.  Local discriminative distance metrics ensemble learning , 2013, Pattern Recognit..

[8]  Inderjit S. Dhillon,et al.  Inductive Regularized Learning of Kernel Functions , 2010, NIPS.

[9]  Jiwen Lu,et al.  Discriminative Deep Metric Learning for Face Verification in the Wild , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  T. P. Dinh,et al.  Convex analysis approach to d.c. programming: Theory, Algorithm and Applications , 1997 .

[11]  Amir Globerson,et al.  Metric Learning by Collapsing Classes , 2005, NIPS.

[12]  Bernard De Baets,et al.  Distance metric learning with the Universum , 2017, Pattern Recognit. Lett..

[13]  Bernard De Baets,et al.  Supervised distance metric learning through maximization of the Jeffrey divergence , 2017, Pattern Recognit..

[14]  Jiwen Lu,et al.  Automatic Subspace Learning via Principal Coefficients Embedding , 2014, IEEE Transactions on Cybernetics.

[15]  W. Wong,et al.  On ψ-Learning , 2003 .

[16]  Jian Wang,et al.  Deep Metric Learning with Angular Loss , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[17]  Jiwen Lu,et al.  Deep Localized Metric Learning , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[18]  Yufeng Liu,et al.  Multicategory ψ-Learning and Support Vector Machine: Computational Tools , 2005 .

[19]  Silvio Savarese,et al.  Deep Metric Learning via Lifted Structured Feature Embedding , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Naftali Tishby,et al.  Margin based feature selection - theory and algorithms , 2004, ICML.

[21]  Peng Li,et al.  Distance Metric Learning with Eigenvalue Optimization , 2012, J. Mach. Learn. Res..

[22]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[23]  Yijun Sun,et al.  Iterative RELIEF for Feature Weighting: Algorithms, Theories, and Applications , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Pong C. Yuen,et al.  Semi-supervised metric learning via topology preserving multiple semi-supervised assumptions , 2013, Pattern Recognit..

[25]  C. Lee Giles,et al.  Nonconvex Online Support Vector Machines , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Alexander J. Smola,et al.  Tighter Bounds for Structured Estimation , 2008, NIPS.

[27]  Cordelia Schmid,et al.  Unsupervised metric learning for face identification in TV video , 2011, 2011 International Conference on Computer Vision.

[28]  Lei Wang,et al.  Positive Semidefinite Metric Learning Using Boosting-like Algorithms , 2011, J. Mach. Learn. Res..

[29]  Jiwen Lu,et al.  Large Margin Multi-metric Learning for Face and Kinship Verification in the Wild , 2014, ACCV.

[30]  Peter L. Bartlett,et al.  The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.

[31]  Koby Crammer,et al.  Margin Analysis of the LVQ Algorithm , 2002, NIPS.

[32]  Peter L. Bartlett,et al.  Improved Generalization Through Explicit Optimization of Margins , 2000, Machine Learning.

[33]  Kihyuk Sohn,et al.  Improved Deep Metric Learning with Multi-class N-pair Loss Objective , 2016, NIPS.

[34]  Inderjit S. Dhillon,et al.  Information-theoretic metric learning , 2006, ICML '07.

[35]  Jason Weston,et al.  Large Scale Transductive SVMs , 2006, J. Mach. Learn. Res..

[36]  Jonathan J. Hull,et al.  A Database for Handwritten Text Recognition Research , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[37]  Daoqiang Zhang,et al.  Semi-supervised clustering with metric learning: An adaptive kernel method , 2010, Pattern Recognit..

[38]  Gal Chechik,et al.  Learning Sparse Metrics, One Feature at a Time , 2015, FE@NIPS.

[39]  Bernard De Baets,et al.  Large-scale distance metric learning for k-nearest neighbors regression , 2016, Neurocomputing.

[40]  Brian Kulis,et al.  Metric Learning: A Survey , 2013, Found. Trends Mach. Learn..

[41]  Feiping Nie,et al.  Learning a Mahalanobis distance metric for data clustering and classification , 2008, Pattern Recognit..

[42]  Sinisa Todorovic,et al.  Local-Learning-Based Feature Selection for High-Dimensional Data Analysis , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Rongrong Ji,et al.  Low-Rank Similarity Metric Learning in High Dimensions , 2015, AAAI.

[44]  Jason Weston,et al.  Trading convexity for scalability , 2006, ICML.

[45]  Alan L. Yuille,et al.  The Concave-Convex Procedure (CCCP) , 2001, NIPS.

[46]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[47]  Felipe Cucker,et al.  On the mathematical foundations of learning , 2001 .

[48]  Stefanie Jegelka,et al.  Deep Metric Learning via Facility Location , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Stan Matwin,et al.  Nonlinear Dimensionality Reduction by Unit Ball Embedding (UBE) and Its Application to Image Clustering , 2016, 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA).

[50]  Le Thi Hoai An,et al.  A Difference of Convex Functions Algorithm for Switched Linear Regression , 2014, IEEE Transactions on Automatic Control.

[51]  Yufeng Liu,et al.  Multicategory ψ-Learning , 2006 .

[52]  Chin-Chun Chang,et al.  Generalized iterative RELIEF for supervised distance metric learning , 2010, Pattern Recognit..

[53]  Takafumi Kanamori,et al.  DC Algorithm for Extended Robust Support Vector Machine , 2017, Neural Computation.

[54]  Bin Fang,et al.  Large Margin Subspace Learning for feature selection , 2013, Pattern Recognit..

[55]  Gert R. G. Lanckriet,et al.  Metric Learning to Rank , 2010, ICML.

[56]  Adil M. Bagirov,et al.  Nonsmooth DC programming approach to the minimum sum-of-squares clustering problems , 2016, Pattern Recognit..

[57]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[58]  Geoffrey E. Hinton,et al.  Neighbourhood Components Analysis , 2004, NIPS.

[59]  Zhang Yi,et al.  Connections Between Nuclear-Norm and Frobenius-Norm-Based Representations , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[60]  R. Horst,et al.  DC Programming: Overview , 1999 .

[61]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[62]  Anuj Srivastava,et al.  A spectral representation for appearance-based classification and recognition , 2002, Object recognition supported by user interaction for service robots.

[63]  Jiwen Lu,et al.  Sharable and Individual Multi-View Metric Learning , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[64]  Jiandong Wang,et al.  Margin distribution explanation on metric learning for nearest neighbor classification , 2016, Neurocomputing.

[65]  Jun Yu,et al.  Semantic preserving distance metric learning and applications , 2014, Inf. Sci..

[66]  Marko Robnik-Sikonja,et al.  Theoretical and Empirical Analysis of ReliefF and RReliefF , 2003, Machine Learning.