Completely Lazy Learning

Local classifiers are sometimes called lazy learners because they do not train a classifier until presented with a test sample. However, such methods are generally not completely lazy because the neighborhood size k (or other locality parameter) is usually chosen by cross validation on the training set, which can require significant preprocessing and risks overfitting. We propose a simple alternative to cross validation of the neighborhood size that requires no preprocessing: instead of committing to one neighborhood size, average the discriminants for multiple neighborhoods. We show that this forms an expected estimated posterior that minimizes the expected Bregman loss with respect to the uncertainty about the neighborhood choice. We analyze this approach for six standard and state-of-the-art local classifiers, including discriminative adaptive metric kNN (DANN), a local support vector machine (SVM-KNN), hyperplane distance nearest neighbor (HKNN), and a new local Bayesian quadratic discriminant analysis (local BDA). The empirical effectiveness of this technique versus cross validation is confirmed with experiments on seven benchmark data sets, showing that similar classification performance can be attained without any training.

[1]  C. Holmes,et al.  A probabilistic nearest neighbour method for statistical pattern recognition , 2002 .

[2]  A. D. Gordon,et al.  Interpreting multivariate data , 1982 .

[3]  E. L. Lehmann,et al.  Theory of point estimation , 1950 .

[4]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[5]  Christian Böhm,et al.  Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases , 2001, CSUR.

[6]  D. Edwards,et al.  Statistical Analysis of Gene Expression Microarray Data , 2003 .

[7]  S. Srivastava,et al.  Minimum Expected Risk Estimation for Near-neighbor Classification , 2006 .

[8]  C. J. Stone,et al.  Consistent Nonparametric Regression , 1977 .

[9]  Feifei Li,et al.  Efficient Processing of Top-k Queries in Uncertain Databases with x-Relations , 2008, IEEE Transactions on Knowledge and Data Engineering.

[10]  Robert Tibshirani,et al.  Discriminant Adaptive Nearest Neighbor Classification , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Pascal Vincent,et al.  K-Local Hyperplane and Convex Distance Nearest Neighbor Algorithms , 2001, NIPS.

[12]  R. Sibson,et al.  A brief description of natural neighbor interpolation , 1981 .

[13]  C. A. Murthy,et al.  On visualization and aggregation of nearest neighbor classifiers , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  P. Hall,et al.  Properties of bagged nearest neighbour classifiers , 2005 .

[15]  Yoshihiko Hamamoto,et al.  Classifier design based on the use of nearest neighbor samples , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[16]  Filiberto Pla,et al.  On the use of neighbourhood-based non-parametric classifiers , 1997, Pattern Recognit. Lett..

[17]  David Masip,et al.  Boosted discriminant projections for nearest neighbor classification , 2006, Pattern Recognit..

[18]  J. Friedman Regularized Discriminant Analysis , 1989 .

[19]  Maya R. Gupta,et al.  Nonparametric supervised learning by linear interpolation with maximum entropy , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Fred Hutchinson,et al.  The Fred Hutchinson Cancer Research Center , 1998, Current Biology.

[21]  Jitendra Malik,et al.  SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[22]  ci UniversityTR Voting over Multiple Condensed Nearest Neighbors , 1995 .

[23]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[24]  David W. Aha,et al.  Lazy Learning , 1997, Springer Netherlands.

[25]  Diego Reforgiato Recupero,et al.  Antipole tree indexing to support range search and k-nearest neighbor search in metric spaces , 2005, IEEE Transactions on Knowledge and Data Engineering.

[26]  Maya R. Gupta,et al.  Weighted Nearest Neighbor Classifiers and First-order Error , 2009 .

[27]  Maya R. Gupta,et al.  Bayesian Quadratic Discriminant Analysis , 2007, J. Mach. Learn. Res..

[28]  S. Geisser Posterior Odds for Multivariate Normal Classifications , 1964 .

[29]  Yoshihiko Hamamoto,et al.  A local mean-based nonparametric classifier , 2006, Pattern Recognit. Lett..

[30]  Brian D. Ripley,et al.  Pattern Recognition and Neural Networks , 1996 .

[31]  Kaare Brandt Petersen,et al.  The Matrix Cookbook , 2006 .

[32]  Daniel G. Keehn,et al.  A note on learning for Gaussian properties , 1965, IEEE Trans. Inf. Theory.

[33]  Yuhong Yang,et al.  Combining Nearest Neighbor Classifiers Versus Cross-Validation Selection , 2004, Statistical applications in genetics and molecular biology.

[34]  Shengyu Zhang,et al.  Combinatorial algorithms for nearest neighbors, near-duplicates and small-world design , 2009, SODA.

[35]  G. Celeux,et al.  Regularized Gaussian Discriminant Analysis through Eigenvalue Decomposition , 1996 .

[36]  Wai Lam,et al.  Discovering Useful Concept Prototypes for Classification Based on Filtering and Abstraction , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[37]  Stephen D. Bay Combining Nearest Neighbor Classifiers Through Multiple Feature Subsets , 1998, ICML.

[38]  Maya R. Gupta,et al.  Adaptive Local Linear Regression With Application to Printer Color Management , 2008, IEEE Transactions on Image Processing.

[39]  Maya R. Gupta,et al.  Functional Bregman Divergence and Bayesian Estimation of Distributions , 2006, IEEE Transactions on Information Theory.

[40]  Xiaoyan Liu,et al.  Efficient k-NN Search on Streaming Data Series , 2003, SSTD.

[41]  Xin Guo,et al.  On the optimality of conditional expectation as a Bregman predictor , 2005, IEEE Trans. Inf. Theory.

[42]  Xiang Lian,et al.  Efficient Similarity Search in Nonmetric Spaces with Local Constant Embedding , 2008, IEEE Transactions on Knowledge and Data Engineering.

[43]  David B. Skalak,et al.  Prototype Selection for Composite Nearest Neighbor Classifiers , 1995 .

[44]  Andrew W. Moore,et al.  New Algorithms for Efficient High-Dimensional Nonparametric Classification , 2006, J. Mach. Learn. Res..

[45]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.