Stratification for scaling up evolutionary prototype selection

Evolutionary algorithms has been recently used for prototype selection showing good results. An important problem that we can find is the scaling up problem that appears evaluating the Evolutionary Prototype Selection algorithms in large size data sets. In this paper, we offer a proposal to solve the drawbacks introduced by the evaluation of large size data sets using evolutionary prototype selection algorithms. In order to do this we have proposed a combination of stratified strategy and CHC as representative evolutionary algorithm model. This study includes a comparison between our proposal and other non-evolutionary prototype selection algorithms combined with the stratified strategy. The results show that stratified evolutionary prototype selection consistently outperforms the non-evolutionary ones, the main advantages being: better instance reduction rates, higher classification accuracy and reduction in resources consumption.

[1]  Zbigniew Michalewicz,et al.  Handbook of Evolutionary Computation , 1997 .

[2]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[3]  G. Gates,et al.  The reduced nearest neighbor rule (Corresp.) , 1972, IEEE Trans. Inf. Theory.

[4]  Ludmila I. Kuncheva,et al.  Editing for the k-nearest neighbors rule by a genetic algorithm , 1995, Pattern Recognit. Lett..

[5]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[6]  Tony R. Martinez,et al.  Reduction Techniques for Instance-Based Learning Algorithms , 2000, Machine Learning.

[7]  Dennis L. Wilson,et al.  Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..

[8]  Francisco Herrera,et al.  Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study , 2003, IEEE Trans. Evol. Comput..

[9]  Shinn-Ying Ho,et al.  Design of an optimal nearest neighbor classifier using an intelligent genetic algorithm , 2002, Pattern Recognit. Lett..

[10]  Tony R. Martinez,et al.  Instance Pruning Techniques , 1997, ICML.

[11]  H. Ishibuchi,et al.  GA-based approaches for finding the minimum reference set for nearest neighbor classification , 1998, 1998 IEEE International Conference on Evolutionary Computation Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98TH8360).

[12]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[13]  T. Ravindra Babu,et al.  Comparison of genetic algorithm based prototype selection schemes , 2001, Pattern Recognit..

[14]  Larry J. Eshelman,et al.  The CHC Adaptive Search Algorithm: How to Have Safe Search When Engaging in Nontraditional Genetic Recombination , 1990, FOGA.

[15]  Dana Angluin,et al.  Learning from noisy examples , 1988, Machine Learning.

[16]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[17]  Tony R. Martinez,et al.  An Integrated Instance‐Based Learning Algorithm , 2000, Comput. Intell..

[18]  David W. Aha,et al.  Learning Representative Exemplars of Concepts: An Initial Case Study , 1987 .

[19]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[20]  Peter E. Hart,et al.  The condensed nearest neighbor rule (Corresp.) , 1968, IEEE Trans. Inf. Theory.

[21]  Hugh B. Woodruff,et al.  An algorithm for a selective nearest neighbor decision rule (Corresp.) , 1975, IEEE Trans. Inf. Theory.