IFS-CoCo: Instance and feature selection based on cooperative coevolution with nearest neighbor rule

Feature and instance selection are two effective data reduction processes which can be applied to classification tasks obtaining promising results. Although both processes are defined separately, it is possible to apply them simultaneously. This paper proposes an evolutionary model to perform feature and instance selection in nearest neighbor classification. It is based on cooperative coevolution, which has been applied to many computational problems with great success. The proposed approach is compared with a wide range of evolutionary feature and instance selection methods for classification. The results contrasted through non-parametric statistical tests show that our model outperforms previously proposed evolutionary approaches for performing data reduction processes in combination with the nearest neighbor rule.

[1]  Paul E. Utgoff,et al.  Randomized Variable Elimination , 2002, J. Mach. Learn. Res..

[2]  Hisao Ishibuchi,et al.  Genetic-Algorithm-Based Instance and Feature Selection , 2001 .

[3]  J. T. de Souza,et al.  A novel approach for integrating feature and instance selection , 2008, ICMLC 2008.

[4]  Maliha S. Nash,et al.  Handbook of Parametric and Nonparametric Statistical Procedures , 2001, Technometrics.

[5]  Larry J. Eshelman,et al.  The CHC Adaptive Search Algorithm: How to Have Safe Search When Engaging in Nontraditional Genetic Recombination , 1990, FOGA.

[6]  B. John Oommen,et al.  On using prototype reduction schemes to enhance the computation of volume-based inter-class overlap measures , 2009, Pattern Recognit..

[7]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[8]  Shinn-Ying Ho,et al.  Design of an optimal nearest neighbor classifier using an intelligent genetic algorithm , 2002, Pattern Recognit. Lett..

[9]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[10]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[11]  Xindong Wu,et al.  The Top Ten Algorithms in Data Mining , 2009 .

[12]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[13]  Sean Luke,et al.  Archive-based cooperative coevolutionary algorithms , 2006, GECCO '06.

[14]  Wei-Yin Loh,et al.  A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-Three Old and New Classification Algorithms , 2000, Machine Learning.

[15]  José Francisco Martínez Trinidad,et al.  A new fast prototype selection method based on clustering , 2010, Pattern Analysis and Applications.

[16]  Sameer Singh,et al.  Multiresolution Estimates of Classification Complexity , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  LarrañagaPedro,et al.  A review of feature selection techniques in bioinformatics , 2007 .

[18]  G. Bortolan,et al.  The problem of linguistic approximation in clinical decision making , 1988, Int. J. Approx. Reason..

[19]  Enrique Vidal,et al.  Learning weighted metrics to minimize nearest-neighbor classification error , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Chris Mellish,et al.  Advances in Instance Selection for Instance-Based Learning Algorithms , 2002, Data Mining and Knowledge Discovery.

[21]  Xin Yao,et al.  An Experimental Study of Hybridizing Cultural Algorithms and Local Search , 2008, Int. J. Neural Syst..

[22]  David H. Wolpert,et al.  Coevolutionary free lunches , 2005, IEEE Transactions on Evolutionary Computation.

[23]  Kyoung-jae Kim Artificial neural networks with evolutionary instance selection for financial forecasting , 2006, Expert Syst. Appl..

[24]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[25]  Richard K. Belew,et al.  New Methods for Competitive Coevolution , 1997, Evolutionary Computation.

[26]  Nicolás García-Pedrajas,et al.  A divide-and-conquer recursive approach for scaling up instance selection algorithms , 2009, Data Mining and Knowledge Discovery.

[27]  Rudolf Paul Wiegand,et al.  An analysis of cooperative coevolutionary algorithms , 2004 .

[28]  Hisao Ishibuchi,et al.  Evolution of Reference Sets in Nearest Neighbor Classification , 1998, SEAL.

[29]  Min Xu,et al.  Efficient data reduction in multimedia data , 2006, Applied Intelligence.

[30]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[31]  Lakhmi C. Jain,et al.  Nearest neighbor classifier: Simultaneous editing and feature selection , 1999, Pattern Recognit. Lett..

[32]  Kenneth A. De Jong,et al.  Sequential versus Parallel Cooperative Coevolutionary Algorithms for Optimization , 2006, 2006 IEEE International Conference on Evolutionary Computation.

[33]  Jack Sklansky,et al.  A note on genetic algorithms for large-scale feature selection , 1989, Pattern Recognition Letters.

[34]  A. E. Eiben,et al.  Introduction to Evolutionary Computing , 2003, Natural Computing Series.

[35]  Elena Marchiori,et al.  Hit Miss Networks with Applications to Instance Selection , 2008, J. Mach. Learn. Res..

[36]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[37]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[38]  Stefan Roth,et al.  Covariance Matrix Adaptation for Multi-objective Optimization , 2007, Evolutionary Computation.

[39]  Josef Hofbauer,et al.  Evolutionary Games and Population Dynamics , 1998 .

[40]  Ron Kohavi,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998 .

[41]  Antonio González Muñoz,et al.  Table Ii Tc Pattern Recognition Result for 120 Eir Satellite Image Cases Selection of Relevant Features in a Fuzzy Genetic Learning Algorithm , 2001 .

[42]  B. John Oommen,et al.  A brief taxonomy and ranking of creative prototype reduction schemes , 2003, Pattern Analysis & Applications.

[43]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[44]  D. J. Newman,et al.  UCI Repository of Machine Learning Database , 1998 .

[45]  Dr. Alex A. Freitas Data Mining and Knowledge Discovery with Evolutionary Algorithms , 2002, Natural Computing Series.

[46]  Shyi-Ming Chen,et al.  Feature subset selection based on fuzzy entropy measures for handling classification problems , 2008, Applied Intelligence.

[47]  J.T. De Souza,et al.  A novel approach for integrating feature and instance selection , 2008, 2008 International Conference on Machine Learning and Cybernetics.

[48]  S. García,et al.  An Extension on "Statistical Comparisons of Classifiers over Multiple Data Sets" for all Pairwise Comparisons , 2008 .

[49]  John L. Casti,et al.  A new initial-value method for on-line filtering and estimation (Corresp.) , 1972, IEEE Trans. Inf. Theory.

[50]  Ian Witten,et al.  Data Mining , 2000 .

[51]  C. G. Hilborn,et al.  The Condensed Nearest Neighbor Rule , 1967 .

[52]  Lakhmi C. Jain,et al.  Evolutionary computation in data mining , 2005 .

[53]  Thomas Jansen,et al.  The Cooperative Coevolutionary (11) EA , 2004, Evolutionary Computation.

[54]  Yin-Fu Huang,et al.  Evolutionary-based feature selection approaches with new criteria for data mining: A case study of credit approval data , 2009, Expert Syst. Appl..

[55]  Byung Ro Moon,et al.  Hybrid Genetic Algorithms for Feature Selection , 2004, IEEE Trans. Pattern Anal. Mach. Intell..

[56]  Nicolás García-Pedrajas,et al.  A cooperative constructive method for neural networks for pattern recognition , 2007, Pattern Recognit..

[57]  Brijesh Verma,et al.  Neural vs. statistical classifier in conjunction with genetic algorithm based feature selection , 2005, Pattern Recognit. Lett..

[58]  Francisco Herrera,et al.  Evolutionary stratified training set selection for extracting classification rules with trade off precision-interpretability , 2007, Data Knowl. Eng..

[59]  Francisco Herrera,et al.  Subgroup discover in large size data sets preprocessed using stratified instance selection for increasing the presence of minority classes , 2008, Pattern Recognit. Lett..

[60]  Bert Thompson,et al.  BIOLOGICAL EVOLUTION , 2004 .

[61]  Marek Grochowski,et al.  Comparison of Instances Seletion Algorithms I. Algorithms Survey , 2004, ICAISC.

[62]  Francisco Herrera,et al.  Stratification for scaling up evolutionary prototype selection , 2005, Pattern Recognit. Lett..

[63]  Huan Liu,et al.  On Issues of Instance Selection , 2002, Data Mining and Knowledge Discovery.

[64]  Alexander Kolesnikov,et al.  Data reduction of large vector graphics , 2005, Pattern Recognit..

[65]  Pedro Larrañaga,et al.  Prototype Selection and Feature Subset Selection by Estimation of Distribution Algorithms. A Case Study in the Survival of Cirrhotic Patients Treated with TIPS , 2001, AIME.

[67]  Ron Kohavi,et al.  Wrappers for feature selection , 1997 .

[68]  James C. Bezdek,et al.  Nearest prototype classifier designs: An experimental study , 2001, Int. J. Intell. Syst..

[69]  María José del Jesús,et al.  Genetic feature selection in a fuzzy rule-based classification system learning process for high-dimensional problems , 2001, Inf. Sci..

[70]  Daniel R. Tauritz,et al.  A no-free-lunch framework for coevolution , 2008, GECCO '08.

[71]  R. Paul Wiegand,et al.  Spatial Embedding and Loss of Gradient in Cooperative Coevolutionary Algorithms , 2004, PPSN.

[72]  R. Paul Wiegand,et al.  An empirical analysis of collaboration methods in cooperative coevolutionary algorithms , 2001 .

[73]  Peter E. Hart,et al.  The condensed nearest neighbor rule (Corresp.) , 1968, IEEE Trans. Inf. Theory.

[74]  Marco Pintore,et al.  Hybrid genetic algorithm for dual selection , 2008, Pattern Analysis and Applications.

[75]  Francisco Herrera,et al.  Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study , 2003, IEEE Trans. Evol. Comput..

[76]  Kyoung-jae Kim,et al.  Bankruptcy prediction modeling with hybrid case-based reasoning and genetic algorithms approach , 2009, Appl. Soft Comput..

[77]  Dorian Pyle,et al.  Data Preparation for Data Mining , 1999 .

[78]  B. John Oommen,et al.  On using prototype reduction schemes to optimize dissimilarity-based classification , 2007, Pattern Recognit..

[79]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[80]  Huan Liu,et al.  Feature Selection and Classification - A Probabilistic Wrapper Approach , 1996, IEA/AIE.

[81]  Darrell Whitley,et al.  Genetic Search for Feature Subset Selection: A Comparison Between CHC and GENESIS , 1998 .

[82]  Ludmila I. Kuncheva,et al.  Editing for the k-nearest neighbors rule by a genetic algorithm , 1995, Pattern Recognit. Lett..

[83]  Petra Perner,et al.  Prototype-based classification , 2008, Applied Intelligence.

[84]  Santanu Santra,et al.  A genetic approach for efficient outlier detection in projected space , 2008, Pattern Recognit..

[85]  Tony R. Martinez,et al.  Reduction Techniques for Instance-Based Learning Algorithms , 2000, Machine Learning.

[86]  Dennis L. Wilson,et al.  Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..

[87]  Kenneth A. De Jong,et al.  Cooperative Coevolution: An Architecture for Evolving Coadapted Subcomponents , 2000, Evolutionary Computation.

[88]  Xin Yao,et al.  Evolving edited k-Nearest Neighbor Classifiers , 2008, Int. J. Neural Syst..

[89]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[90]  Jerzy W. Bala,et al.  Using Learning to Facilitate the Evolution of Features for Recognizing Visual Concepts , 1996, Evolutionary Computation.

[91]  Hiroshi Motoda,et al.  Computational Methods of Feature Selection , 2022 .

[92]  Tin Kam Ho,et al.  Complexity Measures of Supervised Classification Problems , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[93]  R. Paul Wiegand,et al.  Improving Coevolutionary Search for Optimal Multiagent Behaviors , 2003, IJCAI.

[94]  Lior Rokach,et al.  Genetic algorithm-based feature set partitioning for classification problems , 2008, Pattern Recognit..

[95]  Thomas Jansen,et al.  Exploring the Explorative Advantage of the Cooperative Coevolutionary (1+1) EA , 2003, GECCO.

[96]  Yun Li,et al.  Feature selection based on loss-margin of nearest neighbor classification , 2009, Pattern Recognit..

[97]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[98]  Huan Liu,et al.  Instance Selection and Construction for Data Mining , 2001 .

[99]  Pedro Larrañaga,et al.  Feature subset selection by Bayesian networks: a comparison with genetic and sequential algorithms , 2001, Int. J. Approx. Reason..

[100]  Francisco Herrera,et al.  A memetic algorithm for evolutionary prototype selection: A scaling up approach , 2008, Pattern Recognit..

[101]  Arie Ben-David,et al.  A lot of randomness is hiding in accuracy , 2007, Eng. Appl. Artif. Intell..