Distance Metric Learning: A Comprehensive Survey

Many machine learning algorithms, such as K Nearest Neighbor (KNN), heavily rely on the distance metric for the input data patterns. Distance Metric learning is to learn a distance metric for the input space of data from a given collection of pair of similar/dissimilar points that preserves the distance relation among the training data. In recent years, many studies have demonstrated, both empirically and theoretically, that a learned metric can significantly improve the performance in classification, clustering and retrieval tasks. This paper surveys the field of distance metric learning from a principle perspective, and includes a broad selection of recent work. In particular, distance metric learning is reviewed under different learning conditions: supervised learning versus unsupervised learning, learning in a global sense versus in a local sense; and the distance matrix based on linear kernel versus nonlinear kernel. In addition, this paper discusses a number of techniques that is central to distance metric learning, including convex programming, positive semi-definite programming, kernel learning, dimension reduction, K Nearest Neighbor, large margin classification, and graph-based approaches.

[1]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[2]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[3]  Philip E. Gill,et al.  Practical optimization , 1981 .

[4]  Ralph Linsker,et al.  An Application of the Principle of Maximum Information Preservation to Linear Systems , 1988, NIPS.

[5]  G. McLachlan Discriminant Analysis and Statistical Pattern Recognition , 1992 .

[6]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[7]  Jerome H. Friedman,et al.  Flexible Metric Nearest Neighbor Classification , 1994 .

[8]  Robert Tibshirani,et al.  Discriminant Adaptive Nearest Neighbor Classification , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Christopher M. Bishop,et al.  GTM: The Generative Topographic Mapping , 1998, Neural Computation.

[10]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[11]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[12]  Adam Krzyzak,et al.  Learning and Design of Principal Curves , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[14]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[15]  Joydeep Ghosh,et al.  A Unified Model for Probabilistic Principal Surfaces , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Gunnar Rätsch,et al.  An introduction to kernel-based learning algorithms , 2001, IEEE Trans. Neural Networks.

[17]  N. Cristianini,et al.  On Kernel-Target Alignment , 2001, NIPS.

[18]  Bernhard Schölkopf,et al.  Regularized Principal Manifolds , 1999, J. Mach. Learn. Res..

[19]  Dimitrios Gunopulos,et al.  Adaptive Nearest Neighbor Classification Using Support Vector Machines , 2001, NIPS.

[20]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[21]  Jing Peng,et al.  Adaptive kernel metric nearest neighbor classification , 2002, Object recognition supported by user interaction for service robots.

[22]  Geoffrey E. Hinton,et al.  Stochastic Neighbor Embedding , 2002, NIPS.

[23]  Misha Pavel,et al.  Adjustment Learning and Relevant Component Analysis , 2002, ECCV.

[24]  Dimitrios Gunopulos,et al.  Locally Adaptive Metric Nearest-Neighbor Classification , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Matthew Brand,et al.  Charting a Manifold , 2002, NIPS.

[26]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[27]  Tomer Hertz,et al.  Learning Distance Functions using Equivalence Relations , 2003, ICML.

[28]  Ivor W. Tsang,et al.  Learning with Idealized Kernels , 2003, ICML.

[29]  Zhihua Zhang,et al.  Parametric Distance Metric Learning with Label Information , 2003, IJCAI.

[30]  Nicolas Le Roux,et al.  Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering , 2003, NIPS.

[31]  Rong Yan,et al.  Negative pseudo-relevance feedback in content-based video retrieval , 2003, MULTIMEDIA '03.

[32]  Xiaofei He,et al.  Locality Preserving Projections , 2003, NIPS.

[33]  Wei-Ying Ma,et al.  Learning a semantic space from user's relevance feedback for image retrieval , 2003, IEEE Trans. Circuits Syst. Video Technol..

[34]  Vin de Silva,et al.  Unsupervised Learning of Curved Manifolds , 2003 .

[35]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[36]  Lawrence K. Saul,et al.  Think Globally, Fit Locally: Unsupervised Learning of Low Dimensional Manifold , 2003, J. Mach. Learn. Res..

[37]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[38]  Geoffrey E. Hinton,et al.  Neighbourhood Components Analysis , 2004, NIPS.

[39]  Wei-Ying Ma,et al.  Learning an image manifold for retrieval , 2004, MULTIMEDIA '04.

[40]  Andrew W. Moore,et al.  Locally Weighted Learning , 1997, Artificial Intelligence Review.

[41]  Jingrui He,et al.  Manifold-ranking based image retrieval , 2004, MULTIMEDIA '04.

[42]  Thierry Pun,et al.  Learning from User Behavior in Image Retrieval: Application of Market Basket Analysis , 2004, International Journal of Computer Vision.

[43]  I. Tsang,et al.  Kernel relevant component analysis for distance metric learning , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[44]  Dimitrios Gunopulos,et al.  Large margin nearest neighbor classifiers , 2005, IEEE Transactions on Neural Networks.

[45]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[46]  Stan Z. Li,et al.  Manifold Learning and Applications in Recognition , 2005 .

[47]  Ming Tang,et al.  Applying neighborhood consistency for fast clustering and kernel density estimation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).