Data mining using genetic programming: the implications of parsimony on generalization error

A common data mining heuristic is, "when choosing between models with the same training error, less complex models should be preferred as they perform better on unseen data". This heuristic may not always hold. In genetic programming a preference for less complex models is implemented as: (i) placing a limit on the size of the evolved program; (ii) penalizing more complex individuals, or both. The paper presents a GP-variant with no limit on the complexity of the evolved program that generates highly accurate models on a common dataset.

[1]  Pedro M. Domingos Occam's Two Razors: The Sharp and the Blunt , 1998, KDD.

[2]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[3]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[4]  Kumar Chellapilla,et al.  Evolving computer programs without subtree crossover , 1997, IEEE Trans. Evol. Comput..

[5]  Gregory Piatetsky-Shapiro,et al.  The KDD process for extracting useful knowledge from volumes of data , 1996, CACM.

[6]  Byoung-Tak Zhang,et al.  Balancing Accuracy and Parsimony in Genetic Programming , 1995, Evolutionary Computation.

[7]  Siddhartha Bhattacharyya,et al.  Direct Marketing Response Models Using Genetic Algorithms , 1998, KDD.

[8]  Michael J. Pazzani,et al.  Exploring the Decision Forest: An Empirical Investigation of Occam's Razor in Decision Tree Induction , 1993, J. Artif. Intell. Res..

[9]  Geoffrey I. Webb The Problem of Missing Values in Decision Tree Grafting , 1998, Australian Joint Conference on Artificial Intelligence.

[10]  David B. Fogel,et al.  Evolutionary Computation: Towards a New Philosophy of Machine Intelligence , 1995 .

[11]  San Cristóbal Mateo,et al.  The Lack of A Priori Distinctions Between Learning Algorithms , 1996 .

[12]  Andrew Y. Ng,et al.  Preventing "Overfitting" of Cross-Validation Data , 1997, ICML.

[13]  D. Wolpert On Overfitting Avoidance as Bias , 1993 .

[14]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .

[15]  H WolpertDavid The lack of a priori distinctions between learning algorithms , 1996 .

[16]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[17]  Peter Nordin,et al.  Genetic programming - An Introduction: On the Automatic Evolution of Computer Programs and Its Applications , 1998 .

[18]  Una-May O'Reilly,et al.  Genetic Programming II: Automatic Discovery of Reusable Programs. , 1994, Artificial Life.

[19]  Thomas Bck,et al.  Evolutionary computation: Toward a new philosophy of machine intelligence , 1997, Complex..

[20]  B. Efron,et al.  A Leisurely Look at the Bootstrap, the Jackknife, and , 1983 .

[21]  Hitoshi Iba,et al.  System Identification using Structured Genetic Algorithms , 1993, ICGA.

[22]  Geoffrey I. Webb Decision Tree Grafting , 1997, IJCAI.