Automatic Construction and Natural-Language Description of Nonparametric Regression Models

This paper presents the beginnings of an automatic statistician, focusing on regression problems. Our system explores an open-ended space of statistical models to discover a good explanation of a data set, and then produces a detailed report with figures and natural language text. Our approach treats unknown regression functions nonparametrically using Gaussian processes, which has two important consequences. First, Gaussian processes can model functions in terms of high-level properties (e.g. smoothness, trends, periodicity, changepoints). Taken together with the compositional structure of our language of models this allows us to automatically describe functions in simple terms. Second, the use of flexible nonparametric models and a rich language for composing them in an open-ended manner also results in state-of-the-art extrapolation performance evaluated over 13 real time series data sets from various domains.

[1]  Gwilym M. Jenkins,et al.  Time series analysis, forecasting and control , 1971 .

[2]  P. Young,et al.  Time series analysis, forecasting and control , 1972, IEEE Transactions on Automatic Control.

[3]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[4]  Temple F. Smith Occam's razor , 1980, Nature.

[5]  E. K. Bowen,et al.  Basic Statistics for Business and Economics , 1982 .

[6]  J. Lean,et al.  Reconstruction of solar irradiance since 1610: Implications for climate change , 1995 .

[7]  Ljup Co Todorovski Declarative Bias in Equation Discovery , 1997 .

[8]  Takashi Washio,et al.  Discovering Admissible Model Equations from Observed Data Based on Scale-Types and Identity Constrains , 1999, IJCAI.

[9]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[10]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[11]  Laura Diosan,et al.  Evolving kernel functions for SVMs by genetic programming , 2007, Sixth International Conference on Machine Learning and Applications (ICMLA 2007).

[12]  Charles Kemp,et al.  The discovery of structural form , 2008, Proceedings of the National Academy of Sciences.

[13]  Hod Lipson,et al.  Distilling Free-Form Natural Laws from Experimental Data , 2009, Science.

[14]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[15]  Steven Reece,et al.  Sequential Bayesian Prediction in the Presence of Changepoints and Faults , 2010, Comput. J..

[16]  Wu Bing,et al.  A GP-based kernel construction and optimization method for RVM , 2010, 2010 The 2nd International Conference on Computer and Automation Engineering (ICCAE).

[17]  Carl E. Rasmussen,et al.  Gaussian Process Change Point Models , 2010, ICML.

[18]  Carl E. Rasmussen,et al.  Sparse Spectrum Gaussian Process Regression , 2010, J. Mach. Learn. Res..

[19]  Joshua B. Tenenbaum,et al.  Exploiting compositionality to explore a large space of model structures , 2012, UAI.

[20]  Sven J. Dickinson,et al.  Video In Sentences Out , 2012, UAI.

[21]  David B. Dunson,et al.  Multiresolution Gaussian Processes , 2012, NIPS.

[22]  M. Ganesalingam,et al.  A fully automatic problem solver with human-style output , 2013, ArXiv.

[23]  Andrew Gordon Wilson,et al.  Gaussian Process Covariance Kernels for Pattern Discovery and Extrapolation , 2013, ArXiv.

[24]  Joshua B. Tenenbaum,et al.  Structure Discovery in Nonparametric Regression through Compositional Kernel Search , 2013, ICML.

[25]  Bernhard Schölkopf,et al.  Nonparametric dynamics estimation for time periodic systems , 2013, 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[26]  Gabriel Kronberger,et al.  Evolution of Covariance Functions for Gaussian Process Regression Using Genetic Programming , 2013, EUROCAST.

[27]  James Robert Lloyd,et al.  GEFCom2012 hierarchical load forecasting: Gradient boosting machines and Gaussian processes , 2014 .