Gaining Deeper Insights in Symbolic Regression

A distinguishing feature of symbolic regression using genetic programming is its ability to identify complex nonlinear white-box models. This is especially relevant in practice where models are extensively scrutinized in order to gain knowledge about underlying processes. This potential is often diluted by the ambiguity and complexity of the models produced by genetic programming. In this contribution we discuss several analysis methods with the common goal to enable better insights in the symbolic regression process and to produce models that are more understandable and show better generalization. In order to gain more information about the process we monitor and analyze the progresses of population diversity, building block information, and even more general genealogy information. Regarding the analysis of results, several aspects such as model simplification, relevance of variables, node impacts, and variable network analysis are presented and discussed.

[1]  Arthur K. Kordon,et al.  Variable Selection in Industrial Datasets Using Pareto Genetic Programming , 2006 .

[2]  Michael Affenzeller,et al.  Macro-economic Time Series Modeling and Interaction Networks , 2011, EvoApplications.

[3]  Ekaterina Vladislavleva,et al.  Separating the wheat from the chaff: on feature selection and feature importance in regression random forests and symbolic regression , 2011, GECCO.

[4]  Stephan M. Winkler,et al.  Visualization of genetic lineages and inheritance information in genetic programming , 2013, GECCO.

[5]  Graham Kendall,et al.  Diversity in genetic programming: an analysis of measures and correlation with fitness , 2004, IEEE Transactions on Evolutionary Computation.

[6]  Anikó Ekárt,et al.  A Metric for Genetic Programs and Fitness Sharing , 2000, EuroGP.

[7]  Mark Kotanchek,et al.  Symbolic Regression Is Not Enough: It Takes a Village to Raise a Model , 2013 .

[8]  L. Altenberg The evolution of evolvability in genetic programming , 1994 .

[9]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[10]  J. K. Kinnear,et al.  Advances in Genetic Programming , 1994 .

[11]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[12]  William B. Langdon,et al.  Some Considerations on the Reason for Bloat , 2002, Genetic Programming and Evolvable Machines.

[13]  Giancarlo Mauri,et al.  Using Subtree Crossover Distance to Investigate Genetic Programming Dynamics , 2006, EuroGP.

[14]  Witold Jacak,et al.  Analysis of Selected Evolutionary Algorithms in Feature Selection and Parameter Optimization for Data Based Tumor Marker Modeling , 2011, EUROCAST.

[15]  N. Hopper,et al.  Analysis of genetic diversity through population history , 1999 .

[16]  Stephan M. Winkler,et al.  Genetic Algorithms and Genetic Programming - Modern Concepts and Practical Applications , 2009 .

[17]  Maarten Keijzer,et al.  Efficiently representing populations in genetic programming , 1996 .

[18]  E. Vladislavleva Model-based problem solving through symbolic regression via pareto genetic programming , 2008 .

[19]  Riccardo Poli,et al.  Foundations of Genetic Programming , 1999, Springer Berlin Heidelberg.

[20]  Riccardo Poli,et al.  A Simple but Theoretically-Motivated Method to Control Bloat in Genetic Programming , 2003, EuroGP.

[21]  J. Friedman Multivariate adaptive regression splines , 1990 .

[22]  Mark Kotanchek,et al.  Trustable symbolic regression models: using ensembles, interval arithmetic and pareto fronts to develop robust and trust-aware models , 2008 .

[23]  Michael Affenzeller,et al.  SASEGASA: A New Generic Parallel Evolutionary Algorithm for Achieving Highest Quality Results , 2004, J. Heuristics.

[24]  David Jackson,et al.  The identification and exploitation of dormancy in genetic programming , 2010, Genetic Programming and Evolvable Machines.

[25]  Nikhil R. Pal,et al.  Genetic programming for simultaneous feature selection and classifier design , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).