Scalable Large-Margin Distance Metric Learning Using Stochastic Gradient Descent

The key to success of many machine learning and pattern recognition algorithms is the way of computing distances between the input data. In this paper, we propose a large-margin-based approach, called the large-margin distance metric learning (LMDML), for learning a Mahalanobis distance metric. LMDML employs the principle of margin maximization to learn the distance metric with the goal of improving ${k}$ -nearest-neighbor classification. The main challenge of distance metric learning is the positive semidefiniteness constraint on the Mahalanobis matrix. Semidefinite programming is commonly used to enforce this constraint, but it becomes computationally intractable on large-scale data sets. To overcome this limitation, we develop an efficient algorithm based on a stochastic gradient descent. Our algorithm can avoid the computations of the full gradient and ensure that the learned matrix remains within the positive semidefinite cone after each iteration. Extensive experiments show that the proposed algorithm is scalable to large data sets and outperforms other state-of-the-art distance metric learning approaches regarding classification accuracy and training time.

[1]  Fei Sha,et al.  Similarity Learning for High-Dimensional Sparse Data , 2014, AISTATS.

[2]  Rong Jin,et al.  Regularized Distance Metric Learning: Theory and Algorithm , 2009, NIPS.

[3]  Peng Li,et al.  Distance Metric Learning with Eigenvalue Optimization , 2012, J. Mach. Learn. Res..

[4]  Lorenzo Torresani,et al.  Large Margin Component Analysis , 2006, NIPS.

[5]  Tomer Hertz,et al.  Learning a Mahalanobis Metric from Equivalence Constraints , 2005, J. Mach. Learn. Res..

[6]  Amir Globerson,et al.  Metric Learning by Collapsing Classes , 2005, NIPS.

[7]  Kilian Q. Weinberger,et al.  Large Margin Multi-Task Metric Learning , 2010, NIPS.

[8]  Andrew W. Moore,et al.  Efficient memory-based learning for robot control , 1990 .

[9]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[10]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[11]  Lei Wang,et al.  Positive Semidefinite Metric Learning Using Boosting-like Algorithms , 2011, J. Mach. Learn. Res..

[12]  Shengcai Liao,et al.  Salient Color Names for Person Re-identification , 2014, ECCV.

[13]  Horst Bischof,et al.  Large scale metric learning from equivalence constraints , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  L. Bottou Stochastic Gradient Learning in Neural Networks , 1991 .

[15]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[16]  Ji Wan,et al.  SOML: Sparse Online Metric Learning with Application to Image Retrieval , 2014, AAAI.

[17]  Jinfeng Yi,et al.  Efficient distance metric learning by adaptive sampling and mini-batch stochastic gradient descent (SGD) , 2013, Machine Learning.

[18]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[19]  Yoram Singer,et al.  Online and batch learning of pseudo-metrics , 2004, ICML.

[20]  Bernard De Baets,et al.  Supervised distance metric learning through maximization of the Jeffrey divergence , 2017, Pattern Recognit..

[21]  Ohad Shamir,et al.  Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes , 2012, ICML.

[22]  Ambuj Tewari,et al.  Stochastic methods for l1 regularized loss minimization , 2009, ICML '09.

[23]  Kilian Q. Weinberger,et al.  Metric Learning for Kernel Regression , 2007, AISTATS.

[24]  Elad Hazan,et al.  Projection-free Online Learning , 2012, ICML.

[25]  Ian T. Jolliffe,et al.  Principal Component Analysis , 2002, International Encyclopedia of Statistical Science.

[26]  Rongrong Ji,et al.  Low-Rank Similarity Metric Learning in High Dimensions , 2015, AAAI.

[27]  H. Robbins A Stochastic Approximation Method , 1951 .

[28]  Lijun Zhang,et al.  Efficient Stochastic Optimization for Low-Rank Distance Metric Learning , 2017, AAAI.

[29]  John Langford,et al.  Cover trees for nearest neighbor , 2006, ICML.

[30]  Jinfeng Yi,et al.  Stochastic Gradient Descent with Only One Projection , 2012, NIPS.

[31]  Henryk Wozniakowski,et al.  Estimating the Largest Eigenvalue by the Power and Lanczos Algorithms with a Random Start , 1992, SIAM J. Matrix Anal. Appl..

[32]  Fei Wang,et al.  Semisupervised Metric Learning by Maximizing Constraint Margin , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[33]  Lei Wang,et al.  Scalable Large-Margin Mahalanobis Distance Metric Learning , 2010, IEEE Transactions on Neural Networks.

[34]  Ioannis A. Kakadiaris,et al.  An Overview and Empirical Comparison of Distance Metric Learning Methods , 2017, IEEE Transactions on Cybernetics.

[35]  Yun Fu,et al.  Low-Rank and Sparse Modeling for Visual Analysis , 2014, Springer International Publishing.

[36]  Yunsong Guo,et al.  Metric Learning: A Support Vector Approach , 2008, ECML/PKDD.

[37]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[38]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[39]  Rong Jin,et al.  Fine-grained visual categorization via multi-stage metric learning , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Bin Fang,et al.  Large Margin Subspace Learning for feature selection , 2013, Pattern Recognit..

[41]  J. Kuczy,et al.  Estimating the Largest Eigenvalue by the Power and Lanczos Algorithms with a Random Start , 1992 .

[42]  Gert R. G. Lanckriet,et al.  Metric Learning to Rank , 2010, ICML.

[43]  Gert R. G. Lanckriet,et al.  Efficient Learning of Mahalanobis Metrics for Ranking , 2014, ICML.

[44]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[45]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2007, ICML '07.

[46]  Michael Elad,et al.  Optimally sparse representation in general (nonorthogonal) dictionaries via ℓ1 minimization , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[47]  J. K. Benedetti On the Nonparametric Estimation of Regression Functions , 1977 .

[48]  H. Kushner,et al.  Stochastic Approximation and Recursive Algorithms and Applications , 2003 .

[49]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[50]  Geoffrey E. Hinton,et al.  Neighbourhood Components Analysis , 2004, NIPS.

[51]  Marc Sebban,et al.  Metric Learning , 2015, Metric Learning.

[52]  Yuan Shi,et al.  Sparse Compositional Metric Learning , 2014, AAAI.

[53]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[54]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[55]  Bernard De Baets,et al.  Large-scale distance metric learning for k-nearest neighbors regression , 2016, Neurocomputing.

[56]  Brian Kulis,et al.  Metric Learning: A Survey , 2013, Found. Trends Mach. Learn..

[57]  Samy Bengio,et al.  Large Scale Online Learning of Image Similarity through Ranking , 2009, IbPRIA.

[58]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[59]  Ronald A. Cole,et al.  Spoken Letter Recognition , 1990, HLT.

[60]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[61]  Thorsten Joachims,et al.  Learning a Distance Metric from Relative Comparisons , 2003, NIPS.