Differentiable Genetic Programming for High-dimensional Symbolic Regression

Symbolic regression (SR) is the process of discovering hidden relationships from data with mathematical expressions, which is considered an effective way to reach interpretable machine learning (ML). Genetic programming (GP) has been the dominator in solving SR problems. However, as the scale of SR problems increases, GP often poorly demonstrates and cannot effectively address the real-world high-dimensional problems. This limitation is mainly caused by the stochastic evolutionary nature of traditional GP in constructing the trees. In this paper, we propose a differentiable approach named DGP to construct GP trees towards high-dimensional SR for the first time. Specifically, a new data structure called differentiable symbolic tree is proposed to relax the discrete structure to be continuous, thus a gradient-based optimizer can be presented for the efficient optimization. In addition, a sampling method is proposed to eliminate the discrepancy caused by the above relaxation for valid symbolic expressions. Furthermore, a diversification mechanism is introduced to promote the optimizer escaping from local optima for globally better solutions. With these designs, the proposed DGP method can efficiently search for the GP trees with higher performance, thus being capable of dealing with high-dimensional SR. To demonstrate the effectiveness of DGP, we conducted various experiments against the state of the arts based on both GP and deep neural networks. The experiment results reveal that DGP can outperform these chosen peer competitors on high-dimensional regression benchmarks with dimensions varying from tens to thousands. In addition, on the synthetic SR problems, the proposed DGP method can also achieve the best recovery rate even with different noisy levels. It is believed this work can facilitate SR being a powerful alternative to interpretable ML for a broader range of real-world problems.

[1]  Hanghang Tong,et al.  CoNSoLe: Convex Neural Symbolic Learning , 2022, NeurIPS.

[2]  R. Babuška,et al.  SymFormer: End-to-end symbolic regression using transformer-based architecture , 2022, ArXiv.

[3]  Michael Kommenda,et al.  Contemporary Symbolic Regression Methods and their Relative Performance , 2021, NeurIPS Datasets and Benchmarks.

[4]  Giambattista Parascandolo,et al.  Neural Symbolic Regression that Scales , 2021, ICML.

[5]  Marco Virgolin,et al.  Model learning with personalized interpretability estimation (ML-PIE) , 2021, GECCO Companion.

[6]  Qi Chen,et al.  Rademacher Complexity for Enhancing the Generalization of Genetic Programming for Symbolic Regression , 2020, IEEE Transactions on Cybernetics.

[7]  Grant Dick,et al.  Feature standardisation and coefficient optimisation for effective symbolic regression , 2020, GECCO.

[8]  Rui Xu,et al.  Discovering Symbolic Models from Deep Learning with Inductive Biases , 2020, NeurIPS.

[9]  Rodrigo Silva,et al.  Applying Genetic Programming to Improve Interpretability in Machine Learning Models , 2020, 2020 IEEE Congress on Evolutionary Computation (CEC).

[10]  Eric Medvet,et al.  Learning a Formula of Interpretability to Learn Interpretable Formulas , 2020, PPSN.

[11]  Brenden K. Petersen,et al.  Deep symbolic regression: Recovering mathematical expressions from data via risk-seeking policy gradients , 2019, ICLR.

[12]  Vladimir Ceperic,et al.  Integration of Neural Network-Based Symbolic Regression in Deep Learning for Scientific Discovery , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[13]  Xiangxiang Chu,et al.  Fair DARTS: Eliminating Unfair Advantages in Differentiable Architecture Search , 2019, ECCV.

[14]  Mengjie Zhang,et al.  Improving Generalization of Genetic Programming for Symbolic Regression With Angle-Driven Geometric Semantic Operators , 2019, IEEE Transactions on Evolutionary Computation.

[15]  Max Tegmark,et al.  AI Feynman: A physics-inspired method for symbolic regression , 2019, Science Advances.

[16]  Peter A. N. Bosman,et al.  A Model-based Genetic Programming Approach for Symbolic Regression of Small Expressions , 2019, ArXiv.

[17]  Guilherme Seidyo Imai Aldeia,et al.  Interaction–Transformation Evolutionary Algorithm for Symbolic Regression , 2019, Evolutionary Computation.

[18]  Christoph H. Lampert,et al.  Learning Equations for Extrapolation and Control , 2018, ICML.

[19]  Jason H. Moore,et al.  Where are we now?: a large benchmark study of recent symbolic regression methods , 2018, GECCO.

[20]  Gisele L. Pappa,et al.  Solving the exponential growth of symbolic regression trees in geometric semantic genetic programming , 2018, GECCO.

[21]  Fabrício Olivetti de França,et al.  A greedy search tree heuristic for symbolic regression , 2018, Inf. Sci..

[22]  Mengjie Zhang,et al.  Evolving Deep Convolutional Neural Networks for Image Classification , 2017, IEEE Transactions on Evolutionary Computation.

[23]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[24]  Mengjie Zhang,et al.  Feature Selection to Improve Generalization of Genetic Programming for High-Dimensional Symbolic Regression , 2017, IEEE Transactions on Evolutionary Computation.

[25]  Matt J. Kusner,et al.  Grammar Variational Autoencoder , 2017, ICML.

[26]  Randal S. Olson,et al.  PMLB: a large benchmark suite for machine learning evaluation and comparison , 2017, BioData Mining.

[27]  Ignacio Rojas,et al.  Neural networks: An overview of early research, current frameworks and new challenges , 2016, Neurocomputing.

[28]  Christoph H. Lampert,et al.  Extrapolation and learning equations , 2016, ICLR.

[29]  武田 一哉,et al.  Recurrent Neural Networkに基づく日常生活行動認識 , 2016 .

[30]  Jacek M. Zurada,et al.  Deep Learning of Part-Based Representation of Data Using Sparse Autoencoders With Nonnegativity Constraints , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[31]  Jun Ren,et al.  Using Genetic Programming with Prior Formula Knowledge to Solve Symbolic Regression Problem , 2015, Comput. Intell. Neurosci..

[32]  Krzysztof Krawiec,et al.  Semantic Backpropagation for Designing Search Operators in Genetic Programming , 2015, IEEE Transactions on Evolutionary Computation.

[33]  Mengjie Zhang,et al.  Generalisation and domain adaptation in GP with gradient descent for symbolic regression , 2015, 2015 IEEE Congress on Evolutionary Computation (CEC).

[34]  Mario Graff,et al.  Semantic Crossover Based on the Partial Derivative Error , 2014, EuroGP.

[35]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[36]  Michael O'Neill,et al.  Genetic Programming and Evolvable Machines Manuscript No. Semantically-based Crossover in Genetic Programming: Application to Real-valued Symbolic Regression , 2022 .

[37]  Leonardo Vanneschi,et al.  Measuring bloat, overfitting and functional complexity in genetic programming , 2010, GECCO '10.

[38]  Mark Johnston,et al.  How online simplification affects building blocks in genetic programming , 2009, GECCO.

[39]  Hod Lipson,et al.  Distilling Free-Form Natural Laws from Experimental Data , 2009, Science.

[40]  Dick den Hertog,et al.  Order of Nonlinearity as a Complexity Measure for Models Generated by Symbolic Regression via Pareto Genetic Programming , 2009, IEEE Transactions on Evolutionary Computation.

[41]  M. Keijzer Symbolic regression , 2008, GECCO '08.

[42]  Riccardo Poli,et al.  A Field Guide to Genetic Programming , 2008 .

[43]  Mengjie Zhang,et al.  Algebraic simplification of GP programs during evolution , 2006, GECCO.

[44]  Mengjie Zhang,et al.  Learning Weights in Genetic Programs Using Gradient Descent for Object Recognition , 2005, EvoWorkshops.

[45]  A. Topchy,et al.  Faster genetic programming based on local gradient search of numeric leaf values , 2001 .

[46]  John R. Koza,et al.  Genetic programming as a means for programming computers by natural selection , 1994 .

[47]  Pat Langley,et al.  Data-Driven Discovery of Physical Laws , 1981, Cogn. Sci..

[48]  Brenden K. Petersen,et al.  Symbolic Regression via Deep Reinforcement Learning Enhanced Genetic Programming Seeding , 2021, NeurIPS.

[49]  Mengjie Zhang,et al.  Genetic Programming with Embedded Feature Construction for High-Dimensional Symbolic Regression , 2017 .

[50]  Leonardo Vanneschi,et al.  An Introduction to Geometric Semantic Genetic Programming , 2015, NEO.

[51]  Marc Parizeau,et al.  DEAP: evolutionary algorithms made easy , 2012, J. Mach. Learn. Res..

[52]  Ah Chung Tsoi,et al.  The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[53]  Riccardo Poli,et al.  Genetic Programming: An Introduction and Tutorial, with a Survey of Techniques and Applications , 2008, Computational Intelligence: A Compendium.