Extrapolation and learning equations

In classical machine learning, regression is treated as a black box process of identifying a suitable function from a hypothesis set without attempting to gain insight into the mechanism connecting inputs and outputs. In the natural sciences, however, finding an interpretable function for a phenomenon is the prime goal as it allows to understand and generalize results. This paper proposes a novel type of function learning network, called equation learner (EQL), that can learn analytical expressions and is able to extrapolate to unseen domains. It is implemented as an end-to-end differentiable feed-forward network and allows for efficient gradient based training. Due to sparsity regularization concise interpretable expressions can be obtained. Often the true underlying source expression is identified.

[1]  David E. Rumelhart,et al.  Product Units: A Computationally Powerful and Biologically Plausible Extension to Backpropagation Networks , 1989, Neural Computation.

[2]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[3]  Koby Crammer,et al.  A theory of learning from different domains , 2010, Machine Learning.

[4]  Bernhard Schölkopf,et al.  Causal discovery with continuous additive noise models , 2013, J. Mach. Learn. Res..

[5]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[6]  Donald F. Specht,et al.  A general regression neural network , 1991, IEEE Trans. Neural Networks.

[7]  D. Basak,et al.  Support Vector Regression , 2008 .

[8]  Norbert Wiener,et al.  Extrapolation, Interpolation, and Smoothing of Stationary Time Series , 1964 .

[9]  Joydeep Ghosh,et al.  The pi-sigma network: an efficient higher-order neural network for pattern classification and function approximation , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[10]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[11]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[12]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[13]  D. Broomhead,et al.  Radial Basis Functions, Multi-Variable Functional Interpolation and Adaptive Networks , 1988 .

[14]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[15]  Gunnar Rätsch,et al.  Predicting Time Series with Support Vector Machines , 1997, ICANN.

[16]  Neil D. Lawrence,et al.  Dataset Shift in Machine Learning , 2009 .

[17]  Wolfgang Härdle,et al.  Nonparametric Curve Estimation from Time Series , 1989 .

[18]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Wojciech Zaremba,et al.  Learning to Discover Efficient Mathematical Identities , 2014, NIPS.

[20]  E. Kessler,et al.  X-ray transition energies: new approach to a comprehensive evaluation , 2003 .

[21]  Pedro M. Domingos,et al.  Sum-product networks: A new deep architecture , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[22]  John Salvatier,et al.  Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[23]  Hod Lipson,et al.  Distilling Free-Form Natural Laws from Experimental Data , 2009, Science.