A memetic algorithm for evolutionary prototype selection: A scaling up approach

Prototype selection problem consists of reducing the size of databases by removing samples that are considered noisy or not influential on nearest neighbour classification tasks. Evolutionary algorithms have been used recently for prototype selection showing good results. However, due to the complexity of this problem when the size of the databases increases, the behaviour of evolutionary algorithms could deteriorate considerably because of a lack of convergence. This additional problem is known as the scaling up problem. Memetic algorithms are approaches for heuristic searches in optimization problems that combine a population-based algorithm with a local search. In this paper, we propose a model of memetic algorithm that incorporates an ad hoc local search specifically designed for optimizing the properties of prototype selection problem with the aim of tackling the scaling up problem. In order to check its performance, we have carried out an empirical study including a comparison between our proposal and previous evolutionary and non-evolutionary approaches studied in the literature. The results have been contrasted with the use of non-parametric statistical procedures and show that our approach outperforms previously studied methods, especially when the database scales up.

[1]  G. Gates,et al.  The reduced nearest neighbor rule (Corresp.) , 1972, IEEE Trans. Inf. Theory.

[2]  A. Dickson On Evolution , 1884, Science.

[3]  David B. Skalak,et al.  Prototype and Feature Selection by Sampling and Random Mutation Hill Climbing Algorithms , 1994, ICML.

[4]  Nicolás García-Pedrajas,et al.  A cooperative constructive method for neural networks for pattern recognition , 2007, Pattern Recognit..

[5]  Enrique Vidal,et al.  Learning prototypes and distances (LPD). A prototype reduction technique based on nearest neighbor error minimization , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[6]  Francisco Herrera,et al.  Stratification for scaling up evolutionary prototype selection , 2005, Pattern Recognit. Lett..

[7]  R. Iman,et al.  Approximations of the critical region of the fbietkan statistic , 1980 .

[8]  W. Hart Adaptive global optimization with local search , 1994 .

[9]  Pablo Moscato,et al.  On Evolution, Search, Optimization, Genetic Algorithms and Martial Arts : Towards Memetic Algorithms , 1989 .

[10]  Robert P. W. Duin,et al.  Prototype selection for dissimilarity-based classifiers , 2006, Pattern Recognit..

[11]  William E. Hart,et al.  Recent Advances in Memetic Algorithms , 2008 .

[12]  Pedro Larrañaga,et al.  Prototype Selection and Feature Subset Selection by Estimation of Distribution Algorithms. A Case Study in the Survival of Cirrhotic Patients Treated with TIPS , 2001, AIME.

[13]  R. Lewontin ‘The Selfish Gene’ , 1977, Nature.

[14]  Rm Cameron-Jones,et al.  Instance Selection by Encoding Length Heuristic with Random Mutation Hill Climbing , 1995 .

[15]  A. E. Eiben,et al.  Introduction to Evolutionary Computing , 2003, Natural Computing Series.

[16]  Kalyanmoy Deb,et al.  A Comparative Analysis of Selection Schemes Used in Genetic Algorithms , 1990, FOGA.

[17]  Dr. Alex A. Freitas Data Mining and Knowledge Discovery with Evolutionary Algorithms , 2002, Natural Computing Series.

[18]  Shinn-Ying Ho,et al.  Design of an optimal nearest neighbor classifier using an intelligent genetic algorithm , 2002, Pattern Recognit. Lett..

[19]  Ludmila I. Kuncheva,et al.  Editing for the k-nearest neighbors rule by a genetic algorithm , 1995, Pattern Recognit. Lett..

[20]  Tony R. Martinez,et al.  Reduction Techniques for Instance-Based Learning Algorithms , 2000, Machine Learning.

[21]  Dennis L. Wilson,et al.  Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..

[22]  Marek Grochowski,et al.  Comparison of Instance Selection Algorithms II. Results and Comments , 2004, ICAISC.

[23]  I. Tomek An Experiment with the Edited Nearest-Neighbor Rule , 1976 .

[24]  D. J. Newman,et al.  UCI Repository of Machine Learning Database , 1998 .

[25]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[26]  Shumeet Baluja,et al.  A Method for Integrating Genetic Search Based Function Optimization and Competitive Learning , 1994 .

[27]  Hisao Ishibuchi,et al.  Evolution of Reference Sets in Nearest Neighbor Classification , 1998, SEAL.

[28]  Chris Mellish,et al.  Advances in Instance Selection for Instance-Based Learning Algorithms , 2002, Data Mining and Knowledge Discovery.

[29]  Trevor Darrell,et al.  Nearest-Neighbor Methods in Learning and Vision: Theory and Practice (Neural Information Processing) , 2006 .

[30]  James Smith,et al.  A tutorial for competent memetic algorithms: model, taxonomy, and design issues , 2005, IEEE Transactions on Evolutionary Computation.

[31]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[32]  Filiberto Pla,et al.  Prototype selection for the nearest neighbour rule through proximity graphs , 1997, Pattern Recognit. Lett..

[33]  Francisco Herrera,et al.  Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study , 2003, IEEE Trans. Evol. Comput..

[34]  Francisco Herrera,et al.  Real-Coded Memetic Algorithms with Crossover Hill-Climbing , 2004, Evolutionary Computation.

[35]  B. John Oommen,et al.  On using prototype reduction schemes to optimize dissimilarity-based classification , 2007, Pattern Recognit..

[36]  James C. Bezdek,et al.  Nearest prototype classification: clustering, genetic algorithms, or random search? , 1998, IEEE Trans. Syst. Man Cybern. Part C.

[37]  D. Kibler,et al.  Instance-based learning algorithms , 2004, Machine Learning.

[38]  David W. Aha,et al.  Weighting Features , 1995, ICCBR.

[39]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[40]  Shuigeng Zhou,et al.  C-pruner: an improved instance pruning algorithm , 2003, Proceedings of the 2003 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.03EX693).

[41]  Maliha S. Nash,et al.  Handbook of Parametric and Nonparametric Statistical Procedures , 2001, Technometrics.

[42]  Hung-Ming Chen,et al.  Design of nearest neighbor classifiers: multi-objective approach , 2005, Int. J. Approx. Reason..

[43]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[44]  Miguel Toro,et al.  Finding representative patterns with ordered projections , 2003, Pattern Recognit..

[45]  R. Belew,et al.  Evolutionary algorithms with local search for combinatorial optimization , 1998 .

[46]  Lakhmi C. Jain,et al.  Evolutionary computation in data mining , 2005 .

[47]  Filiberto Pla,et al.  Experimental study on prototype optimisation algorithms for prototype-based classification in vector spaces , 2006, Pattern Recognit..

[48]  Luisa Micó,et al.  Some approaches to improve tree-based nearest neighbour search algorithms , 2006, Pattern Recognit..

[49]  Huan Liu,et al.  On Issues of Instance Selection , 2002, Data Mining and Knowledge Discovery.

[50]  Apostolos N. Papadopoulos,et al.  Nearest Neighbor Search:: A Database Perspective , 2004 .

[51]  Larry J. Eshelman,et al.  The CHC Adaptive Search Algorithm: How to Have Safe Search When Engaging in Nontraditional Genetic Recombination , 1990, FOGA.