On visualization and aggregation of nearest neighbor classifiers

Nearest neighbor classification is one of the simplest and most popular methods for statistical pattern recognition. A major issue in k-nearest neighbor classification is how to find an optimal value of the neighborhood parameter k. In practice, this value is generally estimated by the method of cross-validation. However, the ideal value of k in a classification problem not only depends on the entire data set, but also on the specific observation to be classified. Instead of using any single value of k, this paper studies results for a finite sequence of classifiers indexed by k. Along with the usual posterior probability estimates, a new measure, called the Bayesian measure of strength, is proposed and investigated in this paper as a measure of evidence for different classes. The results of these classifiers and their corresponding estimated misclassification probabilities are visually displayed using shaded strips. These plots provide an effective visualization of the evidence in favor of different classes when a given data point is to be classified. We also propose a simple weighted averaging technique that aggregates the results of different nearest neighbor classifiers to arrive at the final decision. Based on the analysis of several benchmark data sets, the proposed method is found to be better than using a single value of k.

[1]  Larry D. Hostetler,et al.  Optimization of k nearest neighbor density estimates , 1973, IEEE Trans. Inf. Theory.

[2]  Sargur N. Srihari,et al.  Decision Combination in Multiple Classifier Systems , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[4]  David B. Skalak,et al.  Prototype Selection for Composite Nearest Neighbor Classifiers , 1995 .

[5]  Berkman Sahiner,et al.  Dual system approach to computer-aided detection of breast masses on mammograms. , 2006, Medical physics.

[6]  C. G. Hilborn,et al.  The Condensed Nearest Neighbor Rule , 1967 .

[7]  Belur V. Dasarathy,et al.  Nearest neighbor (NN) norms: NN pattern classification techniques , 1991 .

[8]  C. A. Murthy,et al.  Density-Based Multiscale Data Condensation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  L. Ryd,et al.  On bias. , 1994, Acta orthopaedica Scandinavica.

[10]  Steven N. MacEachern,et al.  Classification via kernel product estimators , 1998 .

[11]  David G. Stork,et al.  Pattern Classification , 1973 .

[12]  M. Stone Cross-validation:a review 2 , 1978 .

[13]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[14]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[15]  G. McLachlan Discriminant Analysis and Statistical Pattern Recognition , 1992 .

[16]  Debasis Sengupta,et al.  Classification Using Kernel Density Estimates , 2006, Technometrics.

[17]  Robert Tibshirani,et al.  Discriminant Adaptive Nearest Neighbor Classification , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Josef Kittler,et al.  Pattern recognition : a statistical approach , 1982 .

[19]  P. Mahalanobis On the generalized distance in statistics , 1936 .

[20]  Geoffrey J. McLachlan,et al.  Discriminant Analysis and Statistical Pattern Recognition: McLachlan/Discriminant Analysis & Pattern Recog , 2005 .

[21]  C. Quesenberry,et al.  A nonparametric estimate of a multivariate density function , 1965 .

[22]  G. E. Peterson,et al.  Control Methods Used in a Study of the Vowels , 1951 .

[23]  Hans C. van Houwelingen,et al.  The Elements of Statistical Learning, Data Mining, Inference, and Prediction. Trevor Hastie, Robert Tibshirani and Jerome Friedman, Springer, New York, 2001. No. of pages: xvi+533. ISBN 0‐387‐95284‐5 , 2004 .

[24]  J. L. Hodges,et al.  Discriminatory Analysis - Nonparametric Discrimination: Consistency Properties , 1989 .

[25]  Anil K. Ghosh,et al.  Multi-scale Kernel discrminant analysis , 2003 .

[26]  Yuhong Yang,et al.  Combining Nearest Neighbor Classifiers Versus Cross-Validation Selection , 2004, Statistical applications in genetics and molecular biology.

[27]  Sanghamitra Bandyopadhyay,et al.  Genetic algorithms for generation of class boundaries , 1998, IEEE Trans. Syst. Man Cybern. Part B.

[28]  Alfred V. Aho,et al.  The Design and Analysis of Computer Algorithms , 1974 .

[29]  Jerome H. Friedman,et al.  Flexible Metric Nearest Neighbor Classification , 1994 .

[30]  Charles E. Heckler,et al.  Applied Multivariate Statistical Analysis , 2005, Technometrics.

[32]  Peter E. Hart,et al.  The condensed nearest neighbor rule (Corresp.) , 1968, IEEE Trans. Inf. Theory.

[33]  W. Issel,et al.  Aho, A. V. / Hopcroft, J. E. / Ullman, J. D., The Design and Analysis of Computer Algorithms. London‐Amsterdam‐Don Mills‐Sydney. Addison‐Wesley Publ. Comp. 1974 X, 470 S., $ 24,– , 1979 .

[34]  D. Opitz,et al.  Popular Ensemble Methods: An Empirical Study , 1999, J. Artif. Intell. Res..

[35]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[36]  ci UniversityTR Voting over Multiple Condensed Nearest Neighbors , 1995 .

[37]  L. Breiman Arcing classifier (with discussion and a rejoinder by the author) , 1998 .

[38]  M. R. Mickey,et al.  Estimation of Error Rates in Discriminant Analysis , 1968 .

[39]  Probal Chaudhuri,et al.  Significance in Scale Space for Bivariate Density Estimation , 2002 .

[40]  Pavel Pudil,et al.  Introduction to Statistical Pattern Recognition , 2006 .

[41]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[42]  Niall M. Adams,et al.  Likelihood inference in nearest‐neighbour classification models , 2003 .

[43]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[44]  Geoffrey J. McLachlan,et al.  Estimation of Error Rates , 2005 .

[45]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[46]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[47]  J. Marron,et al.  SiZer for Exploration of Structures in Curves , 1999 .

[48]  C. Holmes,et al.  A probabilistic nearest neighbour method for statistical pattern recognition , 2002 .

[49]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .