Genetic Programming-based Construction of Features for Machine Learning and Knowledge Discovery Tasks

In this paper we use genetic programming for changing the representation of the input data for machine learners. In particular, the topic of interest here is feature construction in the learning-from-examples paradigm, where new features are built based on the original set of attributes. The paper first introduces the general framework for GP-based feature construction. Then, an extended approach is proposed where the useful components of representation (features) are preserved during an evolutionary run, as opposed to the standard approach where valuable features are often lost during search. Finally, we present and discuss the results of an extensive computational experiment carried out on several reference data sets. The outcomes show that classifiers induced using the representation enriched by the GP-constructed features provide better accuracy of classification on the test set. In particular, the extended approach proposed in the paper proved to be able to outperform the standard approach on some benchmark problems on a statistically significant level.

[1]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[2]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[3]  Wei-Yin Loh,et al.  A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-Three Old and New Classification Algorithms , 2000, Machine Learning.

[4]  Sebastian Thrun,et al.  The MONK''s Problems-A Performance Comparison of Different Learning Algorithms, CMU-CS-91-197, Sch , 1991 .

[5]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[6]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[7]  Lalit M. Patnaik,et al.  Application of genetic programming for multicategory pattern classification , 2000, IEEE Trans. Evol. Comput..

[8]  Pankaj Mehra,et al.  Constructive Induction Framework , 1989, ML Workshop.

[9]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[10]  Krzysztof Krawiec,et al.  Pairwise Comparison of Hypotheses in Evolutionary Learning , 2001, ICML.

[11]  Ryszard S. Michalski,et al.  A theory and methodology of inductive learning , 1993 .

[12]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[13]  Hilan Bensusan,et al.  Constructive Induction using Genetic Programming , 1996, ICML 1996.

[14]  D. E. Goldberg,et al.  Genetic Algorithms in Search , 1989 .

[15]  Hilan Bensusan,et al.  Automatic bias learning: an inquiry into the inductive basis of induction , 1999 .

[16]  John R. Koza,et al.  Genetic programming II (videotape): the next generation , 1994 .

[17]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[18]  John J. Grefenstette,et al.  Lamarckian Learning in Multi-Agent Environments , 1991, ICGA.

[19]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[20]  Krzysztof Krawiec,et al.  Genetic Programming with Local Improvement for Visual Learning from Examples , 2001, CAIP.

[21]  K. Dejong,et al.  An analysis of the behavior of a class of genetic adaptive systems , 1975 .

[22]  Krzysztof Krawiec,et al.  Coevolutionary Construction of Features for Transformation of Representation in Machine Learning , 2002 .

[23]  Wolfgang Banzhaf,et al.  Evolving Teams of Predictors with Linear Genetic Programming , 2001, Genetic Programming and Evolvable Machines.

[24]  Krzysztof Krawiec,et al.  Evolutionary weighting of image features for diagnosing of CNS tumors , 2000, Artif. Intell. Medicine.

[25]  Anil K. Jain,et al.  Dimensionality reduction using genetic algorithms , 2000, IEEE Trans. Evol. Comput..

[26]  Jihoon Yang,et al.  Feature Subset Selection Using a Genetic Algorithm , 1998, IEEE Intell. Syst..

[27]  Ibrahim F. Imam,et al.  AN EMPIRICAL COMPARISON BETWEEN GLOBAL AND GREEDY-LIKE SEARCH FOR FEATURE SELECTION , 2001 .

[28]  Dorothea Heiss-Czedik,et al.  An Introduction to Genetic Algorithms. , 1997, Artificial Life.

[29]  Erik D. Goodman,et al.  Genetic programming for improved data mining: application to the biochemistry of protein interactions , 1996 .

[30]  Larry A. Rendell,et al.  Constructive Induction On Decision Trees , 1989, IJCAI.

[31]  Geoffrey E. Hinton,et al.  Recognizing Hand-written Digits Using Hierarchical Products of Experts , 2002, NIPS.

[32]  Krzysztof Krawiec,et al.  On the Use of Pairwise Comparison of Hypotheses in Evolutionary Learning Applied to Learning from Visual Examples , 2001, MLDM.

[33]  K. De Jong,et al.  Using Genetic Algorithms for Concept Learning , 2004, Machine Learning.

[34]  Pat Langley,et al.  Elements of Machine Learning , 1995 .

[35]  Ron Kohavi,et al.  Data Mining Using MLC a Machine Learning Library in C++ , 1996, Int. J. Artif. Intell. Tools.