Supervised distance metric learning through maximization of the Jeffrey divergence

Over the past decades, distance metric learning has attracted a lot of interest in machine learning and related fields. In this work, we propose an optimization framework for distance metric learning via linear transformations by maximizing the Jeffrey divergence between two multivariate Gaussian distributions derived from local pairwise constraints. In our method, the distance metric is trained on positive and negative difference spaces, which are built from the neighborhood of each training instance, so that the local discriminative information is preserved. We show how to solve this problem with a closed-form solution rather than using tedious optimization procedures. The solution is easy to implement, and tractable for large-scale problems. Experimental results are presented for both a linear and a kernelized version of the proposed method for k-nearest neighbors classification. We obtain classification accuracies superior to the state-of-the-art distance metric learning methods in several cases while being competitive in others. HighlightsWe propose a novel distance metric learning method (DMLMJ) for classification.DMLMJ is simple to implement and it can be solved analytically.We extend DMLMJ into a kernelized version to tackle non-linear problems.Experiments on several data sets show the effectiveness of the proposed method.

[1]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[2]  B. Scholkopf,et al.  Fisher discriminant analysis with kernels , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[3]  Amir Globerson,et al.  Metric Learning by Collapsing Classes , 2005, NIPS.

[4]  Renato De Mori,et al.  A fuzzy decision strategy for topic identification and dynamic selection of language models , 2000, Signal Process..

[5]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[6]  Ian T. Jolliffe,et al.  Principal Component Analysis , 2002, International Encyclopedia of Statistical Science.

[7]  Brigitte Bigi,et al.  Using Kullback-Leibler Distance for Text Categorization , 2003, ECIR.

[8]  S. Holm A Simple Sequentially Rejective Multiple Test Procedure , 1979 .

[9]  Tomer Hertz,et al.  Learning a Mahalanobis Metric from Equivalence Constraints , 2005, J. Mach. Learn. Res..

[10]  Hamid Reza Karimi,et al.  LogDet Divergence-Based Metric Learning With Triplet Constraints and Its Applications , 2014, IEEE Transactions on Image Processing.

[11]  Lawrence K. Saul,et al.  Think Globally, Fit Locally: Unsupervised Learning of Low Dimensional Manifold , 2003, J. Mach. Learn. Res..

[12]  Kilian Q. Weinberger,et al.  Large Margin Multi-Task Metric Learning , 2010, NIPS.

[13]  Mahdieh Soleymani Baghshah,et al.  Non-linear metric learning using pairwise similarity and dissimilarity constraints and the geometrical structure of data , 2010, Pattern Recognit..

[14]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[15]  David J. Sheskin,et al.  Handbook of Parametric and Nonparametric Statistical Procedures , 1997 .

[16]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[17]  Bernard De Baets,et al.  Large-scale distance metric learning for k-nearest neighbors regression , 2016, Neurocomputing.

[18]  Brian Kulis,et al.  Metric Learning: A Survey , 2013, Found. Trends Mach. Learn..

[19]  Nuno Vasconcelos,et al.  A Kullback-Leibler Divergence Based Kernel for SVM Classification in Multimedia Applications , 2003, NIPS.

[20]  David Zhang,et al.  Distance metric learning by knowledge embedding , 2004, Pattern Recognit..

[21]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[22]  Pong C. Yuen,et al.  Semi-supervised metric learning via topology preserving multiple semi-supervised assumptions , 2013, Pattern Recognit..

[23]  Geoffrey E. Hinton,et al.  Neighbourhood Components Analysis , 2004, NIPS.

[24]  Marc Sebban,et al.  Metric Learning , 2015, Metric Learning.

[25]  Yuan Shi,et al.  Sparse Compositional Metric Learning , 2014, AAAI.

[26]  Lorenzo Torresani,et al.  Large Margin Component Analysis , 2006, NIPS.

[27]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[28]  Jingrui He,et al.  Manifold-ranking based image retrieval , 2004, MULTIMEDIA '04.

[29]  Chin-Chun Chang,et al.  Generalized iterative RELIEF for supervised distance metric learning , 2010, Pattern Recognit..

[30]  Bin Fang,et al.  Large Margin Subspace Learning for feature selection , 2013, Pattern Recognit..

[31]  Kaizhu Huang,et al.  Multi-Task Low-Rank Metric Learning Based on Common Subspace , 2011, ICONIP.

[32]  Gert R. G. Lanckriet,et al.  Metric Learning to Rank , 2010, ICML.

[33]  Hong Chang,et al.  Extending the relevant component analysis algorithm for metric learning using both positive and negative equivalence constraints , 2006, Pattern Recognit..

[34]  Inderjit S. Dhillon,et al.  Differential Entropic Clustering of Multivariate Gaussians , 2006, NIPS.

[35]  Ronald A. Cole,et al.  Spoken Letter Recognition , 1990, HLT.

[36]  Peng Li,et al.  Distance Metric Learning with Eigenvalue Optimization , 2012, J. Mach. Learn. Res..

[37]  David G. Stork,et al.  Pattern classification, 2nd Edition , 2000 .

[38]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[39]  O. J. Dunn Multiple Comparisons among Means , 1961 .

[40]  David G. Stork,et al.  Pattern Classification , 1973 .

[41]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[42]  Wei Liu,et al.  Learning Distance Metrics with Contextual Constraints for Image Retrieval , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[43]  Inderjit S. Dhillon,et al.  Online Metric Learning and Fast Similarity Search , 2008, NIPS.

[44]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[45]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[46]  M. Naderi Think globally... , 2004, HIV prevention plus!.

[47]  María José del Jesús,et al.  KEEL: a software tool to assess evolutionary algorithms for data mining problems , 2008, Soft Comput..

[48]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[49]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[50]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[51]  M. Friedman A Comparison of Alternative Tests of Significance for the Problem of $m$ Rankings , 1940 .

[52]  Tat-Seng Chua,et al.  An efficient sparse metric learning in high-dimensional space via l1-penalized log-determinant regularization , 2009, ICML '09.

[53]  Inderjit S. Dhillon,et al.  Information-theoretic metric learning , 2006, ICML '07.

[54]  Kaare Brandt Petersen,et al.  The Matrix Cookbook , 2006 .

[55]  Lei Wang,et al.  Positive Semidefinite Metric Learning Using Boosting-like Algorithms , 2011, J. Mach. Learn. Res..

[56]  Feiping Nie,et al.  Learning a Mahalanobis distance metric for data clustering and classification , 2008, Pattern Recognit..