Hybrid Single Node Genetic Programming for Symbolic Regression

This paper presents a first step of our research on designing an effective and efficient GP-based method for symbolic regression. First, we propose three extensions of the standard Single Node GP, namely 1 a selection strategy for choosing nodes to be mutated based on depth and performance of the nodes, 2 operators for placing a compact version of the best-performing graph to the beginning and to the end of the population, respectively, and 3 a local search strategy with multiple mutations applied in each iteration. All the proposed modifications have been experimentally evaluated on five symbolic regression benchmarks and compared with standard GP and SNGP. The achieved results are promising showing the potential of the proposed modifications to improve the performance of the SNGP algorithm. We then propose two variants of hybrid SNGP utilizing a linear regression technique, LASSO, to improve its performance. The proposed algorithms have been compared to the state-of-the-art symbolic regression methods that also make use of the linear regression techniques on four real-world benchmarks. The results show the hybrid SNGP algorithms are at least competitive with or better than the compared methods.

[1]  Julian Francis Miller,et al.  Cartesian genetic programming , 2010, GECCO.

[2]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[3]  Dominic P. Searson,et al.  GPTIPS: An Open Source Genetic Programming Toolbox For Multigene Symbolic Regression , 2010 .

[4]  Kalyan Veeramachaneni,et al.  Building Predictive Models via Feature Synthesis , 2015, GECCO.

[5]  William E. Hart,et al.  Recent Advances in Memetic Algorithms , 2008 .

[6]  Conor Ryan,et al.  A Simple Approach to Lifetime Learning in Genetic Programming-Based Symbolic Regression , 2014, Evolutionary Computation.

[7]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[8]  David Jackson,et al.  A New, Node-Focused Model for Genetic Programming , 2012, EuroGP.

[9]  Maarten Keijzer,et al.  Scaled Symbolic Regression , 2004, Genetic Programming and Evolvable Machines.

[10]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[11]  Leonardo Vanneschi,et al.  Genetic programming needs better benchmarks , 2012, GECCO '12.

[12]  Ankit Garg,et al.  A multi-gene genetic programming model for estimating stress-dependent soil water retention curves , 2014, Computational Geosciences.

[13]  Dominic P. Searson GPTIPS 2: An Open-Source Software Platform for Symbolic Data Mining , 2014, Handbook of Genetic Programming Applications.

[14]  Krzysztof Krawiec,et al.  Multiple regression genetic programming , 2014, GECCO.

[15]  Cândida Ferreira,et al.  Gene Expression Programming: A New Adaptive Algorithm for Solving Problems , 2001, Complex Syst..

[16]  Trent McConaghy,et al.  FFX: Fast, Scalable, Deterministic Symbolic Regression Technology , 2011 .

[17]  Sean Luke,et al.  Lexicographic Parsimony Pressure , 2002, GECCO.

[18]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[19]  Dick den Hertog,et al.  Order of Nonlinearity as a Complexity Measure for Models Generated by Symbolic Regression via Pareto Genetic Programming , 2009, IEEE Transactions on Evolutionary Computation.

[20]  Michael O'Neill,et al.  Grammatical Evolution: Evolving Programs for an Arbitrary Language , 1998, EuroGP.

[21]  David Jackson,et al.  Single Node Genetic Programming on Problems with Side Effects , 2012, PPSN.