Prioritized grammar enumeration: symbolic regression by dynamic programming

We introduce Prioritized Grammar Enumeration (PGE), a deterministic Symbolic Regression (SR) algorithm using dynamic programming techniques. PGE maintains the tree-based representation and Pareto non-dominated sorting from Genetic Programming (GP), but replaces genetic operators and random number use with grammar production rules and systematic choices. PGE uses non-linear regression and abstract parameters to fit the coefficients of an equation, effectively separating the exploration for form, from the optimization of a form. Memoization enables PGE to evaluate each point of the search space only once, and a Pareto Priority Queue provides direction to the search. Sorting and simplification algorithms are used to transform candidate expressions into a canonical form, reducing the size of the search space. Our results show that PGE performs well on 22 benchmarks from the SR literature, returning exact formulas in many cases. As a deterministic algorithm, PGE offers reliability and reproducibility of results, a key aspect to any system used by scientists at large. We believe PGE is a capable SR implementation, following an alternative perspective we hope leads the community to new ideas.

[1]  M. Farach Optimal suffix tree construction with large alphabets , 1997, Proceedings 38th Annual Symposium on Foundations of Computer Science.

[2]  Marco Laumanns,et al.  SPEA2: Improving the strength pareto evolutionary algorithm , 2001 .

[3]  Mark Kotanchek,et al.  Pareto-Front Exploitation in Symbolic Regression , 2005 .

[4]  Michael F. Korns Accuracy in Symbolic Regression , 2011 .

[5]  Trent McConaghy,et al.  FFX: Fast, Scalable, Deterministic Symbolic Regression Technology , 2011 .

[6]  Mark Kotanchek,et al.  Trustable symbolic regression models: using ensembles, interval arithmetic and pareto fronts to develop robust and trust-aware models , 2008 .

[7]  David A. Van Veldhuizen,et al.  Evolutionary Computation and Convergence to a Pareto Front , 1998 .

[8]  Michael F. Korns Abstract Expression Grammar Symbolic Regression , 2011 .

[9]  Nguyen Xuan Hoai,et al.  Solving the symbolic regression problem with tree-adjunct grammar guided genetic programming: the comparative results , 2002, Proceedings of the 2002 Congress on Evolutionary Computation. CEC'02 (Cat. No.02TH8600).

[10]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[11]  Michael O'Neill,et al.  Genetic Programming and Evolvable Machines Manuscript No. Semantically-based Crossover in Genetic Programming: Application to Real-valued Symbolic Regression , 2022 .

[12]  Gregory Hornby,et al.  ALPS: the age-layered population structure for reducing the problem of premature convergence , 2006, GECCO.

[13]  Sean Luke,et al.  Lexicographic Parsimony Pressure , 2002, GECCO.

[14]  Martin J. Oates,et al.  PESA-II: region-based selection in evolutionary multiobjective optimization , 2001 .

[15]  Jason H. Moore,et al.  Genetic Programming Theory and Practice IX , 2011 .

[16]  Conor Ryan,et al.  Using context-aware crossover to improve the performance of GP , 2006, GECCO '06.

[17]  David E. Goldberg,et al.  A niched Pareto genetic algorithm for multiobjective optimization , 1994, Proceedings of the First IEEE Conference on Evolutionary Computation. IEEE World Congress on Computational Intelligence.

[18]  Hussein A. Abbass,et al.  Tree Adjoining Grammars, Language Bias, and Genetic Programming , 2003, EuroGP.

[19]  Randall K. McRee,et al.  Symbolic regression using nearest neighbor indexing , 2010, GECCO '10.

[20]  Nguyen Xuan Hoai,et al.  A Framework For Tree-Adjunct Grammar Guided Genetic Programming , 2001 .

[21]  Mark Johnston,et al.  Using Numerical Simplification to Control Bloat in Genetic Programming , 2008, SEAL.

[22]  Maarten Keijzer,et al.  Improving Symbolic Regression with Interval Arithmetic and Linear Scaling , 2003, EuroGP.

[23]  C. Fonseca,et al.  GENETIC ALGORITHMS FOR MULTI-OBJECTIVE OPTIMIZATION: FORMULATION, DISCUSSION, AND GENERALIZATION , 1993 .

[24]  Peter A. Whigham,et al.  Grammar-based Genetic Programming: a survey , 2010, Genetic Programming and Evolvable Machines.

[25]  Leonardo Vanneschi,et al.  Genetic programming needs better benchmarks , 2012, GECCO '12.

[26]  Tomoyuki Hiroyasu,et al.  SPEA2+: Improving the Performance of the Strength Pareto Evolutionary Algorithm 2 , 2004, PPSN.

[27]  Michael O'Neill,et al.  Semantic Aware Crossover for Genetic Programming: The Case for Real-Valued Function Regression , 2009, EuroGP.

[28]  Peter Weiner,et al.  Linear Pattern Matching Algorithms , 1973, SWAT.

[29]  Rick L. Riolo,et al.  Genetic Programming Theory and Practice VIII , 2010 .

[30]  G. Raidl A Hybrid GP Approach for Numerically Robust Symbolic Regression , 2002 .

[31]  Peter J. Fleming,et al.  Genetic Algorithms for Multiobjective Optimization: FormulationDiscussion and Generalization , 1993, ICGA.

[32]  Kalyanmoy Deb,et al.  A Fast Elitist Non-dominated Sorting Genetic Algorithm for Multi-objective Optimisation: NSGA-II , 2000, PPSN.

[33]  Julian Francis Miller,et al.  Cartesian genetic programming , 2010, GECCO.

[34]  Mark Johnston,et al.  How online simplification affects building blocks in genetic programming , 2009, GECCO.

[35]  A. Topchy,et al.  Faster genetic programming based on local gradient search of numeric leaf values , 2001 .

[36]  Mauro Birattari,et al.  Swarm Intelligence , 2012, Lecture Notes in Computer Science.

[37]  Michael F. Korns Large-Scale, Time-Constrained Symbolic Regression-Classification , 2008 .