A novel crossover operator based on variable importance for evolutionary multi-objective optimization with tree representation

Selecting reliable predictors has always been crucial in classification. Especially decision trees are very popular for solving supervised variable selection and classification problems. When variable selection has to be performed with regard to acquisition costs, which have to be paid whenever the respective variable is extracted for a new observation, the problem of balancing the predictive power of the model against its costs describes a multi-objective optimization problem which can be solved with meta-heuristics such as evolutionary multi-objective algorithms. In this paper, we present a non-hierarchical evolutionary multi-objective tree learner (NHEMOtree) based on genetic programming using a binary decision tree representation to handle multi-objective optimization problems with equitable optimization criteria. This tree learner is applied to a multi-objective classification problem from medicine as well as to simulated data to evaluate its performance relative to two wrapper approaches based on either NSGA-II or SMS-EMOA with bitstring representation and CART as the enclosed classification algorithm. Moreover, a novel crossover operator based on a multi-objective variable importance measure is introduced. Using this crossover operator, NHEMOtree can be improved.

[1]  Christophe Mues,et al.  Modelling LGD for unsecured personal loans: decision tree approach , 2010, J. Oper. Res. Soc..

[2]  C. Emmanouilidis,et al.  A multiobjective evolutionary setting for feature selection and a commonality-based crossover operator , 2000, Proceedings of the 2000 Congress on Evolutionary Computation. CEC00 (Cat. No.00TH8512).

[3]  Francisco Herrera,et al.  A Survey on the Application of Genetic Programming to Classification , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[4]  Ingo Rechenberg,et al.  Evolutionsstrategie : Optimierung technischer Systeme nach Prinzipien der biologischen Evolution , 1973 .

[5]  Carlos A. Coello Coello,et al.  Applications of multi-objective evolutionary algorithms in economics and finance: A survey , 2007, 2007 IEEE Congress on Evolutionary Computation.

[6]  Thomas Bäck,et al.  Parallel Problem Solving from Nature — PPSN V , 1998, Lecture Notes in Computer Science.

[7]  Andrew Hunter,et al.  Multi-objective Genetic Programming Optimization of Decision Trees for Classifying Medical Data , 2003, KES.

[8]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[9]  Dong-Hee Koh,et al.  Decision Tree of Occupational Lung Cancer Using Classification and Regression Analysis , 2010, Safety and health at work.

[10]  Nicola Beume,et al.  An EMO Algorithm Using the Hypervolume Measure as Selection Criterion , 2005, EMO.

[11]  Lakhmi C. Jain,et al.  Knowledge-Based Intelligent Information and Engineering Systems , 2004, Lecture Notes in Computer Science.

[12]  H. Iba,et al.  Depth-dependent crossover for genetic programming , 1998, 1998 IEEE International Conference on Evolutionary Computation Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98TH8360).

[13]  W. Vent,et al.  Rechenberg, Ingo, Evolutionsstrategie — Optimierung technischer Systeme nach Prinzipien der biologischen Evolution. 170 S. mit 36 Abb. Frommann‐Holzboog‐Verlag. Stuttgart 1973. Broschiert , 1975 .

[14]  Huimin Zhao,et al.  A multi-objective genetic programming approach to developing Pareto optimal decision trees , 2007, Decis. Support Syst..

[15]  Nicola Beume,et al.  SMS-EMOA: Multiobjective selection based on dominated hypervolume , 2007, Eur. J. Oper. Res..

[16]  Kenneth de Jong Parameter Setting in EAs: a 30 Year Perspective , 2007 .

[17]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[18]  Luiz Eduardo Soares de Oliveira,et al.  A Methodology for Feature Selection Using Multiobjective Genetic Algorithms for Handwritten Digit String Recognition , 2003, Int. J. Pattern Recognit. Artif. Intell..

[19]  Francisco Herrera,et al.  A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms , 2011, Swarm Evol. Comput..

[20]  E B Cox,et al.  Estrogen receptor analyses. Correlation of biochemical and immunohistochemical methods using monoclonal antireceptor antibodies. , 1985, Archives of pathology & laboratory medicine.

[21]  Beatriz de la Iglesia,et al.  Rule Induction for Classification Using Multi-objective Genetic Programming , 2007, EMO.

[22]  Thomas J. Santner,et al.  The Design and Analysis of Computer Experiments , 2003, Springer Series in Statistics.

[23]  Marco Laumanns,et al.  Performance assessment of multiobjective optimizers: an analysis and review , 2003, IEEE Trans. Evol. Comput..

[24]  Kalyanmoy Deb,et al.  Muiltiobjective Optimization Using Nondominated Sorting in Genetic Algorithms , 1994, Evolutionary Computation.

[25]  Lothar Thiele,et al.  Comparison of Multiobjective Evolutionary Algorithms: Empirical Results , 2000, Evolutionary Computation.

[26]  Bart Wyns,et al.  Efficient tree traversal to reduce code growth in tree-based genetic programming , 2009, J. Heuristics.

[27]  Achim Zeileis,et al.  Bias in random forest variable importance measures: Illustrations, sources and a solution , 2007, BMC Bioinformatics.

[28]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[29]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[30]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[31]  Holger Schwender,et al.  Identification of SNP interactions using logic regression. , 2008, Biostatistics.

[32]  Kenneth E. Kinnear,et al.  Generality and Difficulty in Genetic Programming: Evolving a Sort , 1993, ICGA.

[33]  T. Brüning,et al.  NOTCH1, HIF1A and Other Cancer-Related Proteins in Lung Tissue from Uranium Miners—Variation by Occupational Exposure and Subtype of Lung Cancer , 2012, PloS one.

[34]  Bernhard Sendhoff,et al.  Pareto-Based Multiobjective Machine Learning: An Overview and Case Studies , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[35]  Enrique Alba,et al.  Sensitivity and specificity based multiobjective approach for feature selection: Application to cancer diagnosis , 2009, Inf. Process. Lett..

[36]  Enrique Alba,et al.  Gene selection in cancer classification using PSO/SVM and GA/SVM hybrid algorithms , 2007, 2007 IEEE Congress on Evolutionary Computation.

[37]  Riccardo Poli,et al.  A Field Guide to Genetic Programming , 2008 .

[38]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[39]  Ruisheng Diao,et al.  Decision Tree-Based Online Voltage Security Assessment Using PMU Measurements , 2009, IEEE Transactions on Power Systems.

[40]  P. Angeline An Investigation into the Sensitivity of Genetic Programming to the Frequency of Leaf Selection Duri , 1996 .

[41]  Peter Nordin,et al.  Genetic programming - An Introduction: On the Automatic Evolution of Computer Programs and Its Applications , 1998 .

[42]  Mehrdad Tamiz,et al.  Multi-objective meta-heuristics: An overview of the current state-of-the-art , 2002, Eur. J. Oper. Res..

[43]  Heike Trautmann,et al.  Online convergence detection for evolutionary multi-objective algorithms revisited , 2010, IEEE Congress on Evolutionary Computation.

[44]  Carlos A. Coello Coello,et al.  Evolutionary multi-objective optimization: some current research trends and topics that remain to be explored , 2009, Frontiers of Computer Science in China.

[45]  Donald H. Kraft,et al.  The use of genetic programming to build queries for information retrieval , 1994, Proceedings of the First IEEE Conference on Evolutionary Computation. IEEE World Congress on Computational Intelligence.

[46]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[47]  Lothar Thiele,et al.  Multiobjective Optimization Using Evolutionary Algorithms - A Comparative Case Study , 1998, PPSN.

[48]  S. Berney,et al.  A classification and regression tree to assist clinical decision making in airway management for patients with cervical spinal cord injury , 2011, Spinal Cord.

[49]  Wolfgang Banzhaf,et al.  Genetic Programming: An Introduction , 1997 .

[50]  Carlos A. Coello Coello,et al.  Evolutionary multi-objective optimization: a historical view of the field , 2006, IEEE Comput. Intell. Mag..