Symbolic Regression using Mixed-Integer Nonlinear Optimization

The Symbolic Regression (SR) problem, where the goal is to find a regression function that does not have a pre-specified form but is any function that can be composed of a list of operators, is a hard problem in machine learning, both theoretically and computationally. Genetic programming based methods, that heuristically search over a very large space of functions, are the most commonly used methods to tackle SR problems. An alternative mathematical programming approach, proposed in the last decade, is to express the optimal symbolic expression as the solution of a system of nonlinear equations over continuous and discrete variables that minimizes a certain objective, and to solve this system via a global solver for mixed-integer nonlinear programming problems. Algorithms based on the latter approach are often very slow. We propose a hybrid algorithm that combines mixed-integer nonlinear optimization with explicit enumeration and incorporates constraints from dimensional analysis. We show that our algorithm is competitive, for some synthetic data sets, with a state-of-the-art SR software and a recent physics-inspired method called AI Feynman.

[1]  Pat Langley,et al.  Data-Driven Discovery of Physical Laws , 1981, Cogn. Sci..

[2]  J. W. Connor,et al.  Scaling laws for plasma confinement , 1977 .

[3]  Dimitris Bertsimas,et al.  Classification and Regression via Integer Optimization , 2007, Oper. Res..

[4]  Dario Izzo,et al.  Differentiable Genetic Programming , 2016, EuroGP.

[5]  Kalyan Veeramachaneni,et al.  FlexGP , 2014, Journal of Grid Computing.

[6]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[7]  Michael D. Schmidt,et al.  Symbolic Regression of Implicit Equations , 2010 .

[8]  Michael F. Korns Accuracy in Symbolic Regression , 2011 .

[9]  Nikolaos V. Sahinidis,et al.  A polyhedral branch-and-cut approach to global optimization , 2005, Math. Program..

[10]  Wolfgang Banzhaf,et al.  Genetic Programming: An Introduction , 1997 .

[11]  I. Langmuir THE ADSORPTION OF GASES ON PLANE SURFACES OF GLASS, MICA AND PLATINUM. , 1918 .

[12]  Gary Montague,et al.  Genetic programming: an introduction and survey of applications , 1997 .

[13]  Max Tegmark,et al.  AI Feynman: A physics-inspired method for symbolic regression , 2019, Science Advances.

[14]  S. Brunton,et al.  Discovering governing equations from data by sparse identification of nonlinear dynamical systems , 2015, Proceedings of the National Academy of Sciences.

[15]  Benjamin Müller,et al.  The SCIP Optimization Suite 5.0 , 2017, 2112.08872.

[16]  Chen Chen,et al.  A divide and conquer method for symbolic regression , 2017, ArXiv.

[17]  Helio J. C. Barbosa,et al.  Symbolic regression via genetic programming , 2000, Proceedings. Vol.1. Sixth Brazilian Symposium on Neural Networks.

[18]  Hod Lipson,et al.  Distilling Free-Form Natural Laws from Experimental Data , 2009, Science.

[19]  B. Schieber,et al.  Globally Optimal Symbolic Regression , 2017, 1710.10720.

[20]  Godfrey A. Walters,et al.  Symbolic and numerical regression: experiments and applications , 2003, Inf. Sci..

[21]  Nikolaos V. Sahinidis,et al.  A global MINLP approach to symbolic regression , 2018, Mathematical Programming.