Scalable Large-Margin Mahalanobis Distance Metric Learning

For many machine learning algorithms such as k-nearest neighbor ( k-NN) classifiers and k-means clustering, often their success heavily depends on the metric used to calculate distances between different data points. An effective solution for defining such a metric is to learn it from a set of labeled training samples. In this work, we propose a fast and scalable algorithm to learn a Mahalanobis distance metric. The Mahalanobis metric can be viewed as the Euclidean distance metric on the input data that have been linearly transformed. By employing the principle of margin maximization to achieve better generalization performances, this algorithm formulates the metric learning as a convex optimization problem and a positive semidefinite (p.s.d.) matrix is the unknown variable. Based on an important theorem that a p.s.d. trace-one matrix can always be represented as a convex combination of multiple rank-one matrices, our algorithm accommodates any differentiable loss function and solves the resulting optimization problem using a specialized gradient descent procedure. During the course of optimization, the proposed algorithm maintains the positive semidefiniteness of the matrix variable that is essential for a Mahalanobis metric. Compared with conventional methods like standard interior-point algorithms or the special solver used in large margin nearest neighbor , our algorithm is much more efficient and has a better performance in scalability. Experiments on benchmark data sets suggest that, compared with state-of-the-art metric learning algorithms, our algorithm can achieve a comparable classification accuracy with reduced computational complexity.

[1]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[2]  Chao Yang,et al.  ARPACK users' guide - solution of large-scale eigenvalue problems with implicitly restarted Arnoldi methods , 1998, Software, environments, tools.

[3]  Jos F. Sturm,et al.  A Matlab toolbox for optimization over symmetric cones , 1999 .

[4]  Inderjit S. Dhillon,et al.  Information-theoretic metric learning , 2006, ICML '07.

[5]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Geoffrey E. Hinton,et al.  Neighbourhood Components Analysis , 2004, NIPS.

[7]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[8]  Kim-Chuan Toh,et al.  Solving semidefinite-quadratic-linear programs using SDPT3 , 2003, Math. Program..

[9]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[10]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[11]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[12]  B. Borchers CSDP, A C library for semidefinite programming , 1999 .

[13]  Glenn Fung,et al.  Learning sparse metrics via linear programming , 2006, KDD '06.

[14]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[15]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[16]  D. Sorensen IMPLICITLY RESTARTED ARNOLDI/LANCZOS METHODS FOR LARGE SCALE EIGENVALUE CALCULATIONS , 1996 .

[17]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[18]  Lei Wang,et al.  PSDBoost: Matrix-Generation Linear Programming for Positive Semidefinite Matrices Learning , 2008, NIPS.

[19]  Antonio Criminisi,et al.  Object categorization by learned universal visual dictionary , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[20]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[21]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[22]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[23]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[24]  Inderjit S. Dhillon,et al.  Low-Rank Kernel Learning with Bregman Matrix Divergences , 2009, J. Mach. Learn. Res..

[25]  Lei Wang,et al.  A Scalable Algorithm for Learning a Mahalanobis Distance Metric , 2009, ACCV.

[26]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[27]  Rong Jin,et al.  A Boosting Framework for Visuality-Preserving Distance Metric Learning and Its Application to Medical Image Retrieval , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.