Theoretical and Empirical Criteria for the Edited Nearest Neighbour Classifier

We aim to dispel the blind faith in theoretical criteria for optimisation of the edited nearest neighbour classifier and its version called the Voronoi classifier. Three criteria from past and recent literature are considered: two bounds using Vapnik-Chervonenkis (VC) dimension and a probabilistic criterion derived by a Bayesian approach. We demonstrate the shortcomings of these criteria for selecting the best reference set, and summarise alternative empirical criteria found in the literature.

[1]  Bilge Karaçali,et al.  Fast minimization of structural risk by nearest neighbor rule , 2003, IEEE Trans. Neural Networks.

[2]  Richard Nock,et al.  An improved bound on the finite-sample risk of the nearest neighbor rule , 2001, Pattern Recognit. Lett..

[3]  Tony R. Martinez,et al.  Reduction Techniques for Instance-Based Learning Algorithms , 2000, Machine Learning.

[4]  Dennis L. Wilson,et al.  Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..

[5]  Javier Pérez-Rodríguez,et al.  A scalable approach to simultaneous evolutionary instance and feature selection , 2013, Inf. Sci..

[6]  Ludmila I. Kuncheva,et al.  Fitness functions in editing k-NN reference set by genetic algorithms , 1997, Pattern Recognit..

[7]  Luc Devroye,et al.  Distribution-free performance bounds with the resubstitution error estimate (Corresp.) , 1979, IEEE Trans. Inf. Theory.

[8]  Francisco Herrera,et al.  Stratified prototype selection based on a steady-state memetic algorithm: a study of scalability , 2010, Memetic Comput..

[9]  Chris Mellish,et al.  Advances in Instance Selection for Instance-Based Learning Algorithms , 2002, Data Mining and Knowledge Discovery.

[10]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[11]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[12]  Nicolás García-Pedrajas,et al.  A divide-and-conquer recursive approach for scaling up instance selection algorithms , 2009, Data Mining and Knowledge Discovery.

[13]  Nicolás García-Pedrajas,et al.  A cooperative coevolutionary algorithm for instance selection for instance-based learning , 2010, Machine Learning.

[14]  Francisco Herrera,et al.  A Taxonomy and Experimental Study on Prototype Generation for Nearest Neighbor Classification , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[15]  Francisco Herrera,et al.  A memetic algorithm for evolutionary prototype selection: A scaling up approach , 2008, Pattern Recognit..

[16]  Nicolás García-Pedrajas,et al.  Large scale instance selection by means of federal instance selection , 2012, Data Knowl. Eng..

[17]  Antonio González Muñoz,et al.  Knowledge-based instance selection: A compromise between efficiency and versatility , 2013, Knowl. Based Syst..

[18]  James C. Bezdek,et al.  Nearest prototype classification: clustering, genetic algorithms, or random search? , 1998, IEEE Trans. Syst. Man Cybern. Part C.

[19]  Francesc J. Ferri,et al.  Another move toward the minimum consistent subset: a tabu search approach to the condensed nearest neighbor rule , 2001, IEEE Trans. Syst. Man Cybern. Part B.

[20]  Belur V. Dasarathy,et al.  Nearest neighbor (NN) norms: NN pattern classification techniques , 1991 .

[21]  H. Ishibuchi,et al.  GA-based approaches for finding the minimum reference set for nearest neighbor classification , 1998, 1998 IEEE International Conference on Evolutionary Computation Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98TH8360).

[22]  Marc Boullé,et al.  Bayesian instance selection for the nearest neighbor rule , 2010, Machine Learning.

[23]  Francisco Herrera,et al.  Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study , 2003, IEEE Trans. Evol. Comput..

[24]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[25]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[26]  José Francisco Martínez Trinidad,et al.  A review of instance selection methods , 2010, Artificial Intelligence Review.

[27]  Francisco Herrera,et al.  Prototype Selection for Nearest Neighbor Classification: Taxonomy and Empirical Study , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  J. L. Hodges,et al.  Discriminatory Analysis - Nonparametric Discrimination: Small Sample Performance , 1952 .

[29]  Francisco Herrera,et al.  Enhancing evolutionary instance selection algorithms by means of fuzzy rough set based feature selection , 2012, Inf. Sci..

[30]  Francisco Herrera,et al.  IFS-CoCo: Instance and feature selection based on cooperative coevolution with nearest neighbor rule , 2010, Pattern Recognit..

[31]  Keinosuke Fukunaga,et al.  Bias of Nearest Neighbor Error Estimates , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Javier Pérez-Rodríguez,et al.  OligoIS: Scalable Instance Selection for Class-Imbalanced Data Sets , 2013, IEEE Transactions on Cybernetics.

[33]  Antonio González Muñoz,et al.  On the use of meta-learning for instance selection: An architecture and an experimental study , 2014, Inf. Sci..

[34]  Francisco Herrera,et al.  Integrating Instance Selection, Instance Weighting, and Feature Weighting for Nearest Neighbor Classifiers by Coevolutionary Algorithms , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[35]  Nicolás García-Pedrajas,et al.  Boosting instance selection algorithms , 2014, Knowl. Based Syst..

[36]  James C. Bezdek,et al.  Some Notes on Twenty One (21) Nearest Prototype Classifiers , 2000, SSPR/SPR.

[37]  Nicolás García-Pedrajas,et al.  Democratic instance selection: A linear complexity instance selection algorithm based on classifier ensemble concepts , 2010, Artif. Intell..

[38]  Olivier Gascuel,et al.  Distribution-free performance bounds with the resubstitution error estimate , 1992, Pattern Recognit. Lett..