Generative Local Metric Learning for Nearest Neighbor Classification

We consider the problem of learning a local metric in order to enhance the performance of nearest neighbor classification. Conventional metric learning methods attempt to separate data distributions in a purely discriminative manner; here we show how to take advantage of information from parametric generative models. We focus on the bias in the information-theoretic error arising from finite sampling effects, and find an appropriate local metric that maximally reduces the bias based upon knowledge from generative models. As a byproduct, the asymptotic theoretical analysis in this work relates metric learning to dimensionality reduction from a novel perspective, which was not understood from previous discriminative approaches. Empirical experiments show that this learned local metric enhances the discriminative nearest neighbor performance on various datasets using simple class conditional generative models such as a Gaussian.

[1]  S. Venkatesh,et al.  Asymptotic expansions of the k nearest neighbor risk , 1998 .

[2]  Amir Globerson,et al.  Metric Learning by Collapsing Classes , 2005, NIPS.

[3]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[4]  M. N. Goria,et al.  A new class of random vector entropy estimators and its applications in testing statistical hypotheses , 2005 .

[5]  Geoffrey E. Hinton,et al.  Neighbourhood Components Analysis , 2004, NIPS.

[6]  Sanjeev R. Kulkarni,et al.  A Nearest-Neighbor Approach to Estimating Divergence between Continuous Random Vectors , 2006, 2006 IEEE International Symposium on Information Theory.

[7]  Robert P. W. Duin,et al.  Linear dimensionality reduction via a heteroscedastic extension of LDA: the Chernoff criterion , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Jitendra Malik,et al.  Image Retrieval and Classification Using Local Distance Functions , 2006, NIPS.

[9]  Alexandros Kalousis,et al.  Parametric Local Metric Learning for Nearest Neighbor Classification , 2012, NIPS.

[10]  Michael Collins,et al.  Learning Label Embeddings for Nearest-Neighbor Multi-class Classification with an Application to Speech Recognition , 2009, NIPS.

[11]  Rajat Raina,et al.  Classification with Hybrid Generative/Discriminative Models , 2003, NIPS.

[12]  Peng Li,et al.  Distance Metric Learning with Eigenvalue Optimization , 2012, J. Mach. Learn. Res..

[13]  David Haussler,et al.  Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[14]  Lei Wang,et al.  Positive Semidefinite Metric Learning with Boosting , 2009, NIPS.

[15]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[16]  Qing Wang,et al.  Divergence Estimation for Multidimensional Densities Via $k$-Nearest-Neighbor Distances , 2009, IEEE Transactions on Information Theory.

[17]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[18]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[19]  David G. Stork,et al.  Pattern classification, 2nd Edition , 2000 .

[20]  Zoran Nenadic,et al.  Information Discriminant Analysis: Feature Extraction with an Information-Theoretic Objective , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Tom Minka,et al.  Principled Hybrids of Generative and Discriminative Models , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[22]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[23]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[24]  Michael I. Jordan,et al.  DiscLDA: Discriminative Learning for Dimensionality Reduction and Classification , 2008, NIPS.

[25]  Richard D. Braatz,et al.  Fisher Discriminant Analysis , 2000 .

[26]  Keinosuke Fukunaga,et al.  The optimal distance measure for nearest neighbor classification , 1981, IEEE Trans. Inf. Theory.

[27]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[28]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[29]  Keinosuke Fukunaga,et al.  An Optimal Global Nearest Neighbor Metric , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Zoran Nenadic,et al.  Approximate information discriminant analysis: A computationally simple heteroscedastic feature extraction technique , 2008, Pattern Recognit..

[31]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[32]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[33]  Gregory Shakhnarovich,et al.  Discriminative Metric Learning by Neighborhood Gerrymandering , 2014, NIPS.

[34]  M. Powell,et al.  On the Modification of LDL T Factorizations , 1974 .

[35]  M. Powell,et al.  On the modification of ^{} factorizations , 1974 .

[36]  Fernando Pérez-Cruz,et al.  Estimation of Information Theoretic Measures for Continuous Random Variables , 2008, NIPS.

[37]  J. Bunch,et al.  Rank-one modification of the symmetric eigenproblem , 1978 .

[38]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[39]  Inderjit S. Dhillon,et al.  Information-theoretic metric learning , 2006, ICML '07.

[40]  Du Tran,et al.  Human Activity Recognition with Metric Learning , 2008, ECCV.

[41]  Samy Bengio,et al.  Large Scale Online Learning of Image Similarity through Ranking , 2009, IbPRIA.

[42]  Ali Ghodsi,et al.  Distance metric learning vs. Fisher discriminant analysis , 2008, AAAI 2008.

[43]  Stephen Tyree,et al.  Non-linear Metric Learning , 2012, NIPS.

[44]  Jagat Narain Kapur,et al.  Measures of information and their applications , 1994 .

[45]  Pietro Perona,et al.  One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.