Structure Discovery in Nonparametric Regression through Compositional Kernel Search

Despite its importance, choosing the structural form of the kernel in nonparametric regression remains a black art. We define a space of kernel structures which are built compositionally by adding and multiplying a small number of base kernels. We present a method for searching over this space of structures which mirrors the scientific discovery process. The learned structures can often decompose functions into interpretable components and enable long-range extrapolation on time-series datasets. Our structure search method outperforms many widely used kernels and kernel combination methods on a variety of prediction tasks.

[1]  Gwilym M. Jenkins,et al.  Time series analysis, forecasting and control , 1971 .

[2]  P. Young,et al.  Time series analysis, forecasting and control , 1972, IEEE Transactions on Automatic Control.

[3]  George E. P. Box,et al.  Time Series Analysis: Forecasting and Control , 1977 .

[4]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[5]  Temple F. Smith Occam's razor , 1980, Nature.

[6]  M. Degroot,et al.  Highly Informative Priors , 1985 .

[7]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[8]  G. Wahba Spline models for observational data , 1990 .

[9]  J. Lean,et al.  Reconstruction of solar irradiance since 1610: Implications for climate change , 1995 .

[10]  Ljup Co Todorovski Declarative Bias in Equation Discovery , 1997 .

[11]  Takashi Washio,et al.  Discovering Admissible Model Equations from Observed Data Based on Scale-Types and Identity Constrains , 1999, IJCAI.

[12]  T. Plate ACCURACY VERSUS INTERPRETABILITY IN FLEXIBLE MODELING : IMPLEMENTING A TRADEOFF USING GAUSSIAN PROCESS MODELS , 1999 .

[13]  Chong Gu Smoothing Spline Anova Models , 2002 .

[14]  Neil D. Lawrence,et al.  Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models , 2005, J. Mach. Learn. Res..

[15]  Geoffrey E. Hinton,et al.  Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes , 2007, NIPS.

[16]  Laura Diosan,et al.  Evolving kernel functions for SVMs by genetic programming , 2007, Sixth International Conference on Machine Learning and Applications (ICMLA 2007).

[17]  Charles Kemp,et al.  The discovery of structural form , 2008, Proceedings of the National Academy of Sciences.

[18]  Francis R. Bach,et al.  Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning , 2008, NIPS.

[19]  Hod Lipson,et al.  Distilling Free-Form Natural Laws from Experimental Data , 2009, Science.

[20]  Trevor Darrell,et al.  Bayesian Localized Multiple Kernel Learning , 2009 .

[21]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[22]  Wu Bing,et al.  A GP-based kernel construction and optimization method for RVM , 2010, 2010 The 2nd International Conference on Computer and Automation Engineering (ICCAE).

[23]  Carl E. Rasmussen,et al.  Additive Gaussian Processes , 2011, NIPS.

[24]  Pedro M. Domingos,et al.  Sum-product networks: A new deep architecture , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[25]  Joshua B. Tenenbaum,et al.  Exploiting compositionality to explore a large space of model structures , 2012, UAI.

[26]  Andrew Gordon Wilson,et al.  Gaussian Process Kernels for Pattern Discovery and Extrapolation , 2013, ICML.

[27]  Andrew Gordon Wilson,et al.  Gaussian Process Covariance Kernels for Pattern Discovery and Extrapolation , 2013, ArXiv.

[28]  R. Tibshirani,et al.  Generalized Additive Models , 1986 .