Feature Manipulation with Genetic Programming

Feature manipulation refers to the process by which the input space of a machine learning task is altered in order to improve the learning quality and performance. Three major aspects of feature manipulation are feature construction, feature ranking and feature selection. This thesis proposes a new filter-based methodology for feature manipulation in classification problems using genetic programming (GP). The goal is to modify the input representation of classification problems in order to improve classification performance and reduce the complexity of classification models. The thesis regards classification problems as a collection of variables including conditional variables (input features) and decision variables (target class labels). GP is used to discover the relationships between these variables. The types of relationship and the ways in which they are discovered vary with the three aspects of feature manipulation. In feature construction, the thesis proposes a GP-based method to construct high-level features in the form of functions of original input features. The functions are evolved by GP using an entropy-based fitness function that maximises the purity of class intervals. Unlike existing algorithms, the proposed GP-based method constructs multiple features and it can effectively perform transformational dimensionality reduction, using only a small number of GP-constructed features while preserving good classification performance. In feature ranking, the thesis proposes two GP-based methods for ranking single features and subsets of features. In single-feature ranking, the proposed method measures the influence of individual features on the classification performance by using GP to evolve a collection of weak classification models, and then measures the contribution of input features to the making of good models. In ranking of subsets of features, a virtual structure for GP trees and a new binary relevance function is proposed to measure the relationship between a subset of features and the target class labels. It is observed that the proposed method can discover complex relationships—such as multi-modal class distributions and multivariate correlations—that cannot be detected by traditional methods. In feature selection, the thesis provides a novel multi-objective GPbased approach to measuring the goodness of subsets of features. The subsets are evaluated based on their cardinality and their relationship to target class labels. The selection is performed by choosing a subset of features from a GP-discovered Pareto front containing suboptimal solutions (subsets). The thesis also proposes a novel method for measuring the redundancy between input features. It is used to select a subset of relevant features that do not exhibit redundancy with respect to each other. It is found that in all three aspects of feature manipulation, the proposed GP-based methodology is effective in discovering relationships between the features of a classification task. In the case of feature construction, the proposed GP-based methods evolve functions of conditional variables that can significantly improve the classification performance and reduce the complexity of the learned classifiers. In the case of feature ranking, the proposed GP-based methods can find complex relationships between conditional variables and decision variables. The resulted ranking shows a strong linear correlation with the actual classification performance. In the case of feature selection, the proposed GP-based method can find a set of sub-optimal subsets of features which provids a trade-off between the number of features and their relevance to the classification task. The proposed redundancy removal method can remove redundant features from a set of features. Both proposed feature selection methods can find an optimal subset of features that yields significantly better classification performance with a much smaller number of features than conventional classification methods. Produced Publications 1. Kourosh Neshatian, Mengjie Zhang, and Mark Johnston. “Feature Construction and Dimension Reduction Using Genetic Programming”. Proceedings of the 20th Australian Joint Conference on Artificial Intelligence (AI’07), Lecture Notes in Artificial Intelligence, Vol. 4830, Springer, Gold Coast, Australia, December 2007. pp 160-170. 2. Kourosh Neshatian and Mengjie Zhang. “Genetic Programming and Class-Wise Orthogonal Transformation for Dimension Reduction in Classification Problems”. Proceedings of the 11th European Conference on Genetic Programming (EuroGP 2008), Lecture Notes in Computer Science, Vol. 4971, Springer, Napoli, Italy, March 2008. pp 242-253. 3. Kourosh Neshatian and Mengjie Zhang. “Genetic Programming for Performance Improvement and Dimensionality Reduction of Classification Problems”. Proceedings of the 2008 IEEE World Congress on Computational Intelligence (CEC’08), IEEE Press, Hong Kong, June 2008. pp 2811-2818. 4. Kourosh Neshatian, Mengjie Zhang, and Peter Andreae. “Genetic Programming for Feature Ranking in Classification Problems”. Proceedings of the seventh International Conference on Simulated Evolution and Learning (SEAL’08), Lecture Notes in Computer Science, Vol. 5361, Springer, Melbourne, Australia, December 2008. pp 544-554.

[1]  Wojciech Jaskowski,et al.  Learning and Recognition of Hand-Drawn Shapes Using Generative Genetic Programming , 2009, EvoWorkshops.

[2]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[3]  Kenneth A. De Jong,et al.  Evolutionary Computation , 2002 .

[4]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[5]  Peter A. Whigham,et al.  Grammatically-based Genetic Programming , 1995 .

[6]  Hiroshi Motoda,et al.  Feature Extraction, Construction and Selection , 1998 .

[7]  Krzysztof Krawiec,et al.  Genetic Graph Programming for Object Detection , 2006, ICAISC.

[8]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[9]  J. Miller An empirical study of the efficiency of learning boolean functions using a Cartesian Genetic Programming approach , 1999 .

[10]  Ashwin Srinivasan,et al.  Feature construction with Inductive Logic Programming: A Study of Quantitative Predictions of Biological Activity Aided by Structural Attributes , 1999, Data Mining and Knowledge Discovery.

[11]  Sargur N. Srihari,et al.  A feature selection framework for text filtering , 2003, Third IEEE International Conference on Data Mining.

[12]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[13]  David J. Montana,et al.  Strongly Typed Genetic Programming , 1995, Evolutionary Computation.

[14]  Deniz Erdogmus,et al.  Information Theoretic Learning , 2005, Encyclopedia of Artificial Intelligence.

[15]  Thomas Bäck,et al.  An Overview of Evolutionary Computation , 1993, ECML.

[16]  Ernesto Costa,et al.  Dynamic Limits for Bloat Control: Variations on Size and Depth , 2004, GECCO.

[17]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[18]  Jihoon Yang,et al.  Feature Subset Selection Using a Genetic Algorithm , 1998, IEEE Intell. Syst..

[19]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[20]  Julian Francis Miller,et al.  Cartesian genetic programming , 2000, GECCO '10.

[21]  Bir Bhanu,et al.  Evolutionary feature synthesis for facial expression recognition , 2006, Pattern Recognit. Lett..

[22]  Terrence J. Sejnowski,et al.  Learned classification of sonar targets using a massively parallel network , 1988, IEEE Trans. Acoust. Speech Signal Process..

[23]  George D. Smith,et al.  Evolutionary Feature Construction Using Information Gain and Gini Index , 2004, EuroGP.

[24]  Richard S. Johannes,et al.  Using the ADAP Learning Algorithm to Forecast the Onset of Diabetes Mellitus , 1988 .

[25]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[26]  Pramod K. Varshney,et al.  Logistic Regression for Feature Selection and Soft Classification of Remote Sensing Data , 2006, IEEE Geoscience and Remote Sensing Letters.

[27]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[28]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[29]  Lalit M. Patnaik,et al.  Application of genetic programming for multicategory pattern classification , 2000, IEEE Trans. Evol. Comput..

[30]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[31]  S. Sathiya Keerthi,et al.  Improvements to Platt's SMO Algorithm for SVM Classifier Design , 2001, Neural Computation.

[32]  Mengjie Zhang,et al.  Pareto front feature selection: using genetic programming to explore feature space , 2009, GECCO.

[33]  Xiaodong Li,et al.  Multi-objective techniques in genetic programming for evolving classifiers , 2005, 2005 IEEE Congress on Evolutionary Computation.

[34]  Larry A. Rendell,et al.  The Feature Selection Problem: Traditional Methods and a New Algorithm , 1992, AAAI.

[35]  Agnar Aamodt,et al.  Case-Based Reasoning: Foundational Issues, Methodological Variations, and System Approaches , 1994, AI Commun..

[36]  Krzysztof Krawiec,et al.  Evolutionary Learning of Primitive-Based Visual Concepts , 2006, 2006 IEEE International Conference on Evolutionary Computation.

[37]  W. Bateson Mendel's Principles of Heredity , 1910, Nature.

[38]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[39]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[40]  Ethem Alpaydin,et al.  Introduction to machine learning , 2004, Adaptive computation and machine learning.

[41]  Hong Guo,et al.  Automated feature extraction using genetic programming for bearing condition monitoring , 2004, Proceedings of the 2004 14th IEEE Signal Processing Society Workshop Machine Learning for Signal Processing, 2004..

[42]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[43]  Erik Goodman,et al.  On Prediction of Epileptic Seizures by Means of Genetic Programming Artificial Features , 2006, Annals of Biomedical Engineering.

[44]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[45]  Mengjie Zhang,et al.  Unsupervised Elimination of Redundant Features Using Genetic Programming , 2009, Australasian Conference on Artificial Intelligence.

[46]  J. H. Moore,et al.  Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. , 2001, American journal of human genetics.

[47]  Mengjie Zhang,et al.  Genetic programming for medical classification: a program simplification approach , 2008, Genetic Programming and Evolvable Machines.

[48]  Arthur K. Kordon,et al.  Variable Selection in Industrial Datasets Using Pareto Genetic Programming , 2006 .

[49]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[50]  Erik D. Goodman,et al.  On Prediction of Epileptic Seizures by Computing Multiple Genetic Programming Artificial Features , 2005, EuroGP.

[51]  Claire Cardie,et al.  Using Decision Trees to Improve Case-Based Learning , 1993, ICML.

[52]  Fernando E. B. Otero,et al.  Genetic Programming for Attribute Construction in Data Mining , 2002, EuroGP.

[53]  David E. Goldberg,et al.  Genetic algorithms and Machine Learning , 1988, Machine Learning.

[54]  Zijian Zheng,et al.  A Comparison of Constructive Induction with Diierent Types of New Attribute , 1996 .

[55]  Mark Kotanchek,et al.  Pareto-Front Exploitation in Symbolic Regression , 2005 .

[56]  Bernhard Schölkopf,et al.  Feature selection for support vector machines by means of genetic algorithm , 2003, Proceedings. 15th IEEE International Conference on Tools with Artificial Intelligence.

[57]  Keinosuke Fukunaga,et al.  A Branch and Bound Algorithm for Feature Subset Selection , 1977, IEEE Transactions on Computers.

[58]  Ian T. Jolliffe,et al.  Principal Component Analysis , 2002, International Encyclopedia of Statistical Science.

[59]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .

[60]  Mengjie Zhang,et al.  Multiclass Object Classification Using Genetic Programming , 2004, EvoWorkshops.

[61]  Krzysztof Krawiec,et al.  Genetic Programming-based Construction of Features for Machine Learning and Knowledge Discovery Tasks , 2002, Genetic Programming and Evolvable Machines.

[62]  Larry Bull,et al.  Feature Construction and Selection Using Genetic Programming and a Genetic Algorithm , 2003, EuroGP.

[63]  Mengjie Zhang,et al.  Fitness Functions in Genetic Programming for Classification with Unbalanced Data , 2007, Australian Conference on Artificial Intelligence.

[64]  Mark Johnston,et al.  Feature Construction and Dimension Reduction Using Genetic Programming , 2007, Australian Conference on Artificial Intelligence.

[65]  Krzysztof Krawiec,et al.  Visual learning by coevolutionary feature synthesis , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[66]  Olvi L. Mangasarian,et al.  Nuclear feature extraction for breast tumor diagnosis , 1993, Electronic Imaging.

[67]  Peter Nordin,et al.  A compiling genetic programming system that directly manipulates the machine-code , 1994 .

[68]  L.J. Fogel,et al.  Intelligent decision-making through a simulation of evolution , 1965 .

[69]  Miguel Á. Carreira-Perpiñán,et al.  A Review of Dimension Reduction Techniques , 2009 .

[70]  Anikó Ekárt,et al.  Using genetic programming and decision trees for generating structural descriptions of four bar mechanisms , 2003, Artificial Intelligence for Engineering Design, Analysis and Manufacturing.

[71]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[72]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[73]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[74]  Kazuo Miyashita,et al.  Improving Performance of GP by Adaptive Terminal Selection , 2000, PRICAI.

[75]  Jacques-André Landry,et al.  Discriminant feature selection by genetic programming : towards a domain independent multi-class object detection system , 2004 .

[76]  Asoke K. Nandi,et al.  Feature generation using genetic programming with application to fault classification , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[77]  Riccardo Poli,et al.  A Field Guide to Genetic Programming , 2008 .

[78]  Zheng Rong Yang,et al.  Evaluation of Mutual Information and Genetic Programming for Feature Selection in QSAR , 2004, J. Chem. Inf. Model..

[79]  Krzysztof Krawiec,et al.  Visual Learning by Evolutionary and Coevolutionary Feature Synthesis , 2007, IEEE Transactions on Evolutionary Computation.

[80]  S. S. Iyengar,et al.  An Evaluation of Filter and Wrapper Methods for Feature Selection in Categorical Clustering , 2005, IDA.

[81]  John Dickinson,et al.  Using the Genetic Algorithm to Generate LISP Source Code to Solve the Prisoner's Dilemma , 1987, ICGA.

[82]  John R. Koza,et al.  Genetic Programming IV: Routine Human-Competitive Machine Intelligence , 2003 .

[83]  George J. Vachtsevanos,et al.  Genetic programming of conventional features to detect seizure precursors , 2007, Eng. Appl. Artif. Intell..

[84]  Krzysztof Krawiec,et al.  Generative learning of visual concepts using multiobjective genetic programming , 2007, Pattern Recognit. Lett..

[85]  P. Nordin Genetic Programming III - Darwinian Invention and Problem Solving , 1999 .

[86]  Yuh-Jyh Hu Constructive Induction: Covering Attribute Spectrum , 1998 .

[87]  Mengjie Zhang,et al.  Genetic Programming for Feature Ranking in Classification Problems , 2008, SEAL.

[88]  Nichael Lynn Cramer,et al.  A Representation for the Adaptive Generation of Simple Sequential Programs , 1985, ICGA.

[89]  John R. Koza,et al.  Hierarchical Genetic Algorithms Operating on Populations of Computer Programs , 1989, IJCAI.

[90]  Rangaraj M. Rangayyan,et al.  Genetic Programming and Feature Selection for Classification of Breast Masses in Mammograms , 2006, 2006 International Conference of the IEEE Engineering in Medicine and Biology Society.

[91]  Riccardo Poli,et al.  Foundations of Genetic Programming , 1999, Springer Berlin Heidelberg.

[92]  Luis Enrique Sucar,et al.  Introduction to Bayesian Networks and Influence Diagrams , 2012 .

[93]  Kurt Geihs,et al.  A tunable model for multi-objective, epistatic, rugged, and neutral fitness landscapes , 2008, GECCO '08.

[94]  Walter Alden Tackett,et al.  Recombination, selection, and the genetic construction of computer programs , 1994 .

[95]  Richard A. Johnson,et al.  Applied Multivariate Statistical Analysis , 1983 .

[96]  Asoke K. Nandi,et al.  Breast Cancer Diagnosis Using Genetic Programming Generated Feature , 2005 .

[97]  Peter Nordin,et al.  Evolving Turing-Complete Programs for a Register Machine with Self-modifying Code , 1995, ICGA.

[98]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[99]  Mengjie Zhang,et al.  Using Genetic Programming for Multiclass Classification by Simultaneously Solving Component Binary Classification Problems , 2005, EuroGP.

[100]  Mengjie Zhang,et al.  Genetic Programming for Feature Subset Ranking in Binary Classification Problems , 2009, EuroGP.

[101]  Michael F. Korns Large-Scale, Time-Constrained Symbolic Regression-Classification , 2008 .

[102]  Eric O. Postma,et al.  Dimensionality Reduction: A Comparative Review , 2008 .

[103]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[104]  Jesús S. Aguilar-Ruiz,et al.  Fast Feature Ranking Algorithm , 2003, KES.

[105]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[106]  Deniz Erdoğmuş INFORMATION THEORETIC LEARNING: RENYI'S ENTROPY AND ITS APPLICATIONS TO ADAPTIVE SYSTEM TRAINING , 2002 .

[107]  Zijian Zheng Constructing New Attributes for Decision Tree Learning , 1996 .

[108]  C. Emmanouilidis,et al.  A multiobjective evolutionary setting for feature selection and a commonality-based crossover operator , 2000, Proceedings of the 2000 Congress on Evolutionary Computation. CEC00 (Cat. No.00TH8512).

[109]  Krzysztof Krawiec,et al.  Coevolutionary Construction of Features for Transformation of Representation in Machine Learning , 2002 .

[110]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[111]  Mengjie Zhang,et al.  Genetic Programming and Class-Wise Orthogonal Transformation for Dimension Reduction in Classification Problems , 2008, EuroGP.

[112]  Julie Wilson,et al.  Novel feature selection method for genetic programming using metabolomic 1H NMR data , 2006 .

[113]  Mengjie Zhang,et al.  Genetic programming for performance improvement and dimensionality reduction of classification problems , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[114]  O. Mangasarian,et al.  Multisurface method of pattern separation for medical diagnosis applied to breast cytology. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[115]  Luca Maria Gambardella,et al.  Ant colony system: a cooperative learning approach to the traveling salesman problem , 1997, IEEE Trans. Evol. Comput..

[116]  Thomas G. Dietterich,et al.  Efficient Algorithms for Identifying Relevant Features , 1992 .

[117]  Abraham Kandel,et al.  Information-theoretic algorithm for feature selection , 2001, Pattern Recognit. Lett..

[118]  R. Siegler Three aspects of cognitive development , 1976, Cognitive Psychology.

[119]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[120]  I Martínez-Pérez,et al.  Genetic programming for classification and feature selection: analysis of 1H nuclear magnetic resonance spectra from human brain tumour biopsies , 1998, NMR in biomedicine.

[121]  Ian H. Witten,et al.  Data mining - practical machine learning tools and techniques, Second Edition , 2005, The Morgan Kaufmann series in data management systems.

[122]  Mark A. Hall,et al.  Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning , 1999, ICML.

[123]  Terrence J. Sejnowski,et al.  Analysis of hidden units in a layered network trained to classify sonar targets , 1988, Neural Networks.

[124]  Nikhil R. Pal,et al.  Genetic programming for simultaneous feature selection and classifier design , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[125]  John R. Koza,et al.  Genetic programming 2 - automatic discovery of reusable programs , 1994, Complex Adaptive Systems.

[126]  皓仁 柯 Classifier design with feature selection and feature extraction using layered genetic programming , 2008 .

[127]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[128]  Gérard Dreyfus,et al.  Ranking a Random Feature for Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[129]  Larry Bull,et al.  Genetic Programming with a Genetic Algorithm for Feature Construction and Selection , 2005, Genetic Programming and Evolvable Machines.

[130]  Thomas G. Dietterich,et al.  Learning Boolean Concepts in the Presence of Many Irrelevant Features , 1994, Artif. Intell..

[131]  Marco Dorigo,et al.  From Natural to Artificial Swarm Intelligence , 1999 .

[132]  Theofanis Sapatinas,et al.  Discriminant Analysis and Statistical Pattern Recognition , 2005 .

[133]  Thy-Hou Lin,et al.  Supervised Feature Ranking Using a Genetic Algorithm Optimized Artificial Neural Network , 2006, J. Chem. Inf. Model..

[134]  I. W. Evett,et al.  Rule induction in forensic science , 1989 .

[135]  Byung Ro Moon,et al.  Hybrid Genetic Algorithms for Feature Selection , 2004, IEEE Trans. Pattern Anal. Mach. Intell..

[136]  J. Biesiada,et al.  Feature ranking methods based on information entropy with Parzen windows , 2005 .

[137]  Lawrence Davis,et al.  Adapting Operator Probabilities in Genetic Algorithms , 1989, ICGA.

[138]  Chuanyi Ji,et al.  Combinations of Weak Classifiers , 1996, NIPS.

[139]  L. Huelsbergen,et al.  Toward simulated evolution of machine-language iteration , 1996 .