Three new instance selection methods based on local sets: A comparative study with several approaches from a bi-objective perspective

The local set is the largest hypersphere centered on an instance such that it does not contain instances from any other class. Due to its geometrical nature, this structure can be very helpful for distance-based classification, such as classification based on the nearest neighbor rule. This paper is focused on instance selection for nearest neighbor classification which, in short, aims to reduce the number of instances in the training set without affecting the classification accuracy. Three instance selection methods based on local sets, which follow different and complementary strategies, are proposed. In an experimental study involving 26 known databases, they are compared with 11 of the most successful state-of-the-art methods in standard and noisy environments. To evaluate their performances, two complementary approaches are applied, the Pareto dominance relation and the Technique for Order Preference by Similarity to Ideal Solution. The results achieved by the proposals reveal that they are among the most effective methods in this field. HighlightsWe propose three selection strategies with different accuracy-reduction tradeoff.We assess them on 26 known databases with more than 1000 instances each one.The results are compared with those of 11 successful state-of-the-art methods.According to different criteria, the new methods are always among the top performers.

[1]  Chris Mellish,et al.  Advances in Instance Selection for Instance-Based Learning Algorithms , 2002, Data Mining and Knowledge Discovery.

[2]  Yoke San Wong,et al.  Effective training data selection in tool condition monitoring system , 2006 .

[3]  Francisco Herrera,et al.  Stratification for scaling up evolutionary prototype selection , 2005, Pattern Recognit. Lett..

[4]  Francisco Herrera,et al.  IFS-CoCo: Instance and feature selection based on cooperative coevolution with nearest neighbor rule , 2010, Pattern Recognit..

[5]  Elena Marchiori,et al.  Hit Miss Networks with Applications to Instance Selection , 2008, J. Mach. Learn. Res..

[6]  Yen-Jen Oyang,et al.  Expediting model selection for support vector machines based on an advanced data reduction algorithm , 2006 .

[7]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[8]  Michela Antonelli,et al.  Genetic Training Instance Selection in Multiobjective Evolutionary Fuzzy Systems: A Coevolutionary Approach , 2012, IEEE Transactions on Fuzzy Systems.

[9]  B. John Oommen,et al.  Enhancing prototype reduction schemes with recursion: a method applicable for "large" data sets , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[10]  Shuicheng Yan,et al.  Correntropy based feature selection using binary projection , 2011, Pattern Recognit..

[11]  José Francisco Martínez Trinidad,et al.  A review of instance selection methods , 2010, Artificial Intelligence Review.

[12]  Peter E. Hart,et al.  The condensed nearest neighbor rule (Corresp.) , 1968, IEEE Trans. Inf. Theory.

[13]  Antonio González Muñoz,et al.  Knowledge-based instance selection: A compromise between efficiency and versatility , 2013, Knowl. Based Syst..

[14]  HerreraFrancisco,et al.  Prototype Selection for Nearest Neighbor Classification , 2012 .

[15]  Francisco Herrera,et al.  Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study , 2003, IEEE Trans. Evol. Comput..

[16]  Francisco Herrera,et al.  Evolutionary stratified training set selection for extracting classification rules with trade off precision-interpretability , 2007, Data Knowl. Eng..

[17]  Francisco Herrera,et al.  A Taxonomy and Experimental Study on Prototype Generation for Nearest Neighbor Classification , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[18]  Francisco Herrera,et al.  Prototype Selection for Nearest Neighbor Classification: Taxonomy and Empirical Study , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Loris Nanni,et al.  Prototype reduction techniques: A comparison among different approaches , 2011, Expert Syst. Appl..

[20]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[21]  José Francisco Martínez Trinidad,et al.  A new fast prototype selection method based on clustering , 2010, Pattern Analysis and Applications.

[22]  Francisco Herrera,et al.  FRPS: A Fuzzy Rough Prototype Selection method , 2013, Pattern Recognit..

[23]  Francisco Herrera,et al.  A study on the application of instance selection techniques in genetic fuzzy rule-based classification systems: Accuracy-complexity trade-off , 2013, Knowl. Based Syst..

[24]  Fabrizio Angiulli,et al.  Fast Nearest Neighbor Condensation for Large Data Sets Classification , 2007, IEEE Transactions on Knowledge and Data Engineering.

[25]  Antonio González Muñoz,et al.  On the use of meta-learning for instance selection: An architecture and an experimental study , 2014, Inf. Sci..

[26]  Jin Li,et al.  Feature evaluation and selection with cooperative game theory , 2012, Pattern Recognit..

[27]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[28]  Elena Marchiori,et al.  Class Conditional Nearest Neighbor for Large Margin Instance Selection , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Roberto Alejo,et al.  Analysis of new techniques to obtain quality training sets , 2003, Pattern Recognit. Lett..

[30]  Hugh B. Woodruff,et al.  An algorithm for a selective nearest neighbor decision rule (Corresp.) , 1975, IEEE Trans. Inf. Theory.

[31]  Francisco Herrera,et al.  A Survey on Evolutionary Instance Selection and Generation , 2010, Int. J. Appl. Metaheuristic Comput..

[32]  Belur V. Dasarathy,et al.  Minimal consistent set (MCS) identification for optimal nearest neighbor decision systems design , 1994, IEEE Trans. Syst. Man Cybern..

[33]  Donghai Guan,et al.  Nearest neighbor editing aided by unlabeled data , 2009, Inf. Sci..

[34]  Javier Pérez-Rodríguez,et al.  A scalable approach to simultaneous evolutionary instance and feature selection , 2013, Inf. Sci..

[35]  Wlodzislaw Duch,et al.  Pruning Classification Rules with Reference Vector Selection Methods , 2010, ICAISC.

[36]  Yu-Lin He,et al.  NRMCS : Noise removing based on the MCS , 2008, 2008 International Conference on Machine Learning and Cybernetics.

[37]  Nicolás García-Pedrajas,et al.  Democratic instance selection: A linear complexity instance selection algorithm based on classifier ensemble concepts , 2010, Artif. Intell..

[38]  Shuigeng Zhou,et al.  C-pruner: an improved instance pruning algorithm , 2003, Proceedings of the 2003 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.03EX693).

[39]  José Ramón Cano,et al.  Diagnose Effective Evolutionary Prototype Selection Using an Overlapping Measure , 2009, Int. J. Pattern Recognit. Artif. Intell..

[40]  Tony R. Martinez,et al.  Reduction Techniques for Instance-Based Learning Algorithms , 2000, Machine Learning.

[41]  Dennis L. Wilson,et al.  Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..

[42]  G. Gates The Reduced Nearest Neighbor Rule , 1998 .

[43]  Jesús Alcalá-Fdez,et al.  KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework , 2011, J. Multiple Valued Log. Soft Comput..

[44]  Antonio González Muñoz,et al.  Combining instance selection methods based on data characterization: An approach to increase their effectiveness , 2011, Inf. Sci..

[45]  B. John Oommen,et al.  Enhancing prototype reduction schemes with LVQ3-type algorithms , 2003, Pattern Recognit..

[46]  C. Hwang Multiple Objective Decision Making - Methods and Applications: A State-of-the-Art Survey , 1979 .

[47]  HerreraF.,et al.  Using evolutionary algorithms as instance selection for data reduction in KDD , 2003 .

[48]  G. Gates,et al.  The reduced nearest neighbor rule (Corresp.) , 1972, IEEE Trans. Inf. Theory.