Distance Metric Learning for Kernel Machines

Recent work in metric learning has significantly improved the state-of-the-art in k-nearest neighbor classification. Support vector machines (SVM), particularly with RBF kernels, are amongst the most popular classification algorithms that uses distance metrics to compare examples. This paper provides an empirical analysis of the efficacy of three of the most popular Mahalanobis metric learning algorithms as pre-processing for SVM training. We show that none of these algorithms generate metrics that lead to particularly satisfying improvements for SVM-RBF classification. As a remedy we introduce support vector metric learning (SVML), a novel algorithm that seamlessly combines the learning of a Mahalanobis metric with the training of the RBF-SVM parameters. We demonstrate the capabilities of SVML on nine benchmark data sets of varying sizes and difficulties. In our study, SVML outperforms all alternative state-of-the-art metric learning algorithms in terms of accuracy and establishes itself as a serious alternative to the standard Euclidean metric with model selection by cross validation.

[1]  K. Schittkowski Optimal parameter selection in support vector machines , 2005 .

[2]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[3]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[4]  Jason Weston,et al.  Large-scale kernel machines , 2007 .

[5]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[6]  Ingo Steinwart,et al.  On the Influence of the Kernel on the Consistency of Support Vector Machines , 2002, J. Mach. Learn. Res..

[7]  Kaare Brandt Petersen,et al.  The Matrix Cookbook , 2006 .

[8]  Misha Pavel,et al.  Adjustment Learning and Relevant Component Analysis , 2002, ECCV.

[9]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[10]  Ching Y. Suen,et al.  Automatic model selection for the optimization of SVM kernels , 2005, Pattern Recognit..

[11]  Inderjit S. Dhillon,et al.  Information-theoretic metric learning , 2006, ICML '07.

[12]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[13]  Thore Graepel,et al.  Kernel Matrix Completion by Semidefinite Programming , 2002, ICANN.

[14]  P. Mahalanobis On the generalized distance in statistics , 1936 .

[15]  Amir Globerson,et al.  Metric Learning by Collapsing Classes , 2005, NIPS.

[16]  Yunqian Ma,et al.  Practical selection of SVM parameters and noise estimation for SVM regression , 2004, Neural Networks.

[17]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[18]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[19]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[20]  Sayan Mukherjee,et al.  Choosing Multiple Parameters for Support Vector Machines , 2002, Machine Learning.

[21]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[22]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[23]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[24]  Yoram Singer,et al.  Online and batch learning of pseudo-metrics , 2004, ICML.

[25]  Sayan Mukherjee,et al.  Feature Selection for SVMs , 2000, NIPS.

[26]  Alexander Zien,et al.  Non-Sparse Regularization and Efficient Training with Multiple Kernels , 2010, ArXiv.

[27]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[28]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[29]  Christian Igel,et al.  Evolutionary tuning of multiple SVM parameters , 2005, ESANN.

[30]  Geoffrey E. Hinton,et al.  Neighbourhood Components Analysis , 2004, NIPS.

[31]  Alexander J. Smola,et al.  Learning the Kernel with Hyperkernels , 2005, J. Mach. Learn. Res..

[32]  Alexander Zien,et al.  Non-Sparse Regularization for Multiple Kernel Learning , 2010, 1003.0079.