Globally Induced Model Trees: An Evolutionary Approach

In the paper we propose a new evolutionary algorithm for induction of univariate regression trees that associate leaves with simple linear regression models. In contrast to typical top-down approaches it globally searches for the best tree structure, tests in internal nodes and models in leaves. The population of initial trees is created with diverse top-down methods on randomly chosen subsamples of the training data. Specialized genetic operators allow the algorithm to efficiently evolve regression trees. Akaike's information criterion (AIC) as the fitness function helps to mitigate the overfitting problem. The preliminary experimental validation is promising as the resulting trees can be significantly less complex with at least comparable performance to the classical top-down counterparts.

[1]  L. Torgo,et al.  Inductive learning of tree-based regression models , 1999 .

[2]  Khalid Saeed,et al.  Information Processing and Security Systems , 2005 .

[3]  Marek Kretowski,et al.  An Evolutionary Algorithm for Global Induction of Regression Trees , 2010, ICAISC.

[4]  Michelangelo Ceci,et al.  Top-down induction of model trees with regression and splitting nodes , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[6]  Marek Kretowski,et al.  Evolutionary Learning of Linear Trees with Embedded Feature Selection , 2006, ICAISC.

[7]  Marek Kretowski,et al.  Global learning of decision trees by an evolutionary algorithm , 2005, Information Processing and Security Systems.

[8]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[9]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[10]  Johannes Gehrke,et al.  SECRET: a scalable linear regression tree algorithm , 2002, KDD.

[11]  Ryszard Tadeusiewicz,et al.  Artificial Intelligence and Soft Computing - ICAISC 2006, 8th International Conference, Zakopane, Poland, June 25-29, 2006, Proceedings , 2006, International Conference on Artificial Intelligence and Soft Computing.

[12]  William H. Press,et al.  Numerical recipes in C , 2002 .

[13]  Zbigniew Michalewicz,et al.  Genetic Algorithms + Data Structures = Evolution Programs , 1996, Springer Berlin Heidelberg.

[14]  Gregory Piatetsky-Shapiro,et al.  Advances in Knowledge Discovery and Data Mining , 2004, Lecture Notes in Computer Science.

[15]  C. Mitchell Dayton,et al.  Best Regression Model Using Information Criteria , 2002 .

[16]  Robert A. Lordo,et al.  Learning from Data: Concepts, Theory, and Methods , 2001, Technometrics.

[17]  H. Akaike A new look at the statistical model identification , 1974 .

[18]  J. R. Quinlan Learning With Continuous Classes , 1992 .

[19]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[20]  Marek Kretowski,et al.  Evolutionary Induction of Mixed Decision Trees , 2007, Int. J. Data Warehous. Min..

[21]  William D. Shannon,et al.  Tree-Based Models for Fiting Stratified Linear Regression Models , 2002, J. Classif..

[22]  Zbigniew Michalewicz,et al.  Genetic algorithms + data structures = evolution programs (3rd ed.) , 1996 .

[23]  Xiaogang Su,et al.  Joint Statistical Meetings- Statistical Computing Section Maximum Likelihood Regression Trees , 2022 .

[24]  Jacek M. Zurada,et al.  Artificial Intelligence and Soft Computing, 10th International Conference, ICAISC 2010, Zakopane, Poland, June 13-17, 2010, Part I , 2010, International Conference on Artificial Intelligence and Soft Computing.

[25]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .