Evolving Gaussian Process kernels from elementary mathematical expressions

Choosing the most adequate kernel is crucial in many Machine Learning applications. Gaussian Process is a state-of-the-art technique for regression and classification that heavily relies on a kernel function. However, in the Gaussian Process literature, kernels have usually been either ad hoc designed, selected from a predefined set, or searched for in a space of compositions of kernels which have been defined a priori. In this paper, we propose a Genetic-Programming algorithm that represents a kernel function as a tree of elementary mathematical expressions. By means of this representation, a wider set of kernels can be modeled, where potentially better solutions can be found, although new challenges also arise. The proposed algorithm is able to overcome these difficulties and find kernels that accurately model the characteristics of the data. This method has been tested in several real-world time-series extrapolation problems, improving the state-of-the-art results while reducing the complexity of the kernels.

[1]  David Duvenaud,et al.  Automatic model construction with Gaussian processes , 2014 .

[2]  D. Ginsbourger,et al.  Additive Covariance Kernels for High-Dimensional Gaussian Process Modeling , 2011, 1111.6233.

[3]  Kate Smith-Miles,et al.  A meta-learning approach to automatic kernel selection for support vector machines , 2006, Neurocomputing.

[4]  Marc Peter Deisenroth,et al.  Analytic Long-Term Forecasting with Periodic Gaussian Processes , 2014, AISTATS.

[5]  Borja Calvo,et al.  scmamp: Statistical Comparison of Multiple Algorithms in Multiple Problems , 2016, R J..

[6]  Bernd Bischl,et al.  Tuning and evolution of support vector kernels , 2012, Evol. Intell..

[7]  François Bachoc,et al.  Cross Validation and Maximum Likelihood estimations of hyper-parameters of Gaussian processes with model misspecification , 2013, Comput. Stat. Data Anal..

[8]  Carl E. Rasmussen,et al.  Evaluating Predictive Uncertainty Challenge , 2005, MLCW.

[9]  Carl E. Rasmussen,et al.  Gaussian Process Change Point Models , 2010, ICML.

[10]  David J. Montana,et al.  Strongly Typed Genetic Programming , 1995, Evolutionary Computation.

[11]  Sean Luke,et al.  Evolving kernels for support vector machine classification , 2007, GECCO '07.

[12]  Roman Garnett,et al.  Bayesian optimization for automated model selection , 2016, NIPS.

[13]  David J. C. MacKay,et al.  Bayesian Methods for Backpropagation Networks , 1996 .

[14]  Martin A. Riedmiller,et al.  Optimization of Gaussian process hyperparameters using Rprop , 2013, ESANN.

[15]  Marc G. Genton,et al.  Classes of Kernels for Machine Learning: A Statistics Perspective , 2002, J. Mach. Learn. Res..

[16]  Michael G. Madden,et al.  The Genetic Kernel Support Vector Machine: Description and Evaluation , 2005, Artificial Intelligence Review.

[17]  Nando de Freitas,et al.  A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.

[18]  Roman Garnett,et al.  Active Learning of Linear Embeddings for Gaussian Processes , 2013, UAI.

[19]  Genetic Programming for Kernel-Based Learning with Co-evolving Subsets Selection , 2006, PPSN.

[20]  Geoffrey E. Hinton,et al.  Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes , 2007, NIPS.

[21]  Julien Bect,et al.  Robust Gaussian Process-Based Global Optimization Using a Fully Bayesian Expected Improvement Criterion , 2011, LION.

[22]  Wu Bing,et al.  A GP-based kernel construction and optimization method for RVM , 2010, 2010 The 2nd International Conference on Computer and Automation Engineering (ICCAE).

[23]  A. Keane,et al.  The development of a hybridized particle swarm for kriging hyperparameter tuning , 2011 .

[24]  Joshua B. Tenenbaum,et al.  Automatic Construction and Natural-Language Description of Nonparametric Regression Models , 2014, AAAI.

[25]  Bernhard Schölkopf,et al.  Nonparametric dynamics estimation for time periodic systems , 2013, 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[26]  Nando de Freitas,et al.  Theoretical Analysis of Bayesian Optimisation with Unknown Gaussian Process Hyper-Parameters , 2014, ArXiv.

[27]  W. G. Marchal,et al.  Basic Statistics for Business and Economics , 1994 .

[28]  Chunhua Deng,et al.  Multi-kernel Gaussian process latent variable regression model for high-dimensional sequential data modeling , 2019, Neurocomputing.

[29]  David J. J. Toal,et al.  Kriging Hyperparameter Tuning Strategies , 2008 .

[30]  M. J. D. Powell,et al.  An efficient method for finding the minimum of a function of several variables without calculating derivatives , 1964, Comput. J..

[31]  Gabriel Kronberger,et al.  Evolution of Covariance Functions for Gaussian Process Regression Using Genetic Programming , 2013, EUROCAST.

[32]  Wei Chu,et al.  Gaussian Processes for Ordinal Regression , 2005, J. Mach. Learn. Res..

[33]  Trevor Cohn,et al.  A temporal model of text periodicities using Gaussian Processes , 2013, EMNLP.

[34]  S. Bochner Lectures on Fourier Integrals. (AM-42) , 1959 .

[35]  Joshua B. Tenenbaum,et al.  Structure Discovery in Nonparametric Regression through Compositional Kernel Search , 2013, ICML.

[36]  Fuzhen Zhang Positive Semidefinite Matrices , 2011 .

[37]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[38]  S. Sundararajan,et al.  Predictive Approaches for Choosing Hyperparameters in Gaussian Processes , 1999, Neural Computation.

[39]  Adam D. Bull,et al.  Convergence Rates of Efficient Global Optimization Algorithms , 2011, J. Mach. Learn. Res..

[40]  R. V. Churchill,et al.  Lectures on Fourier Integrals , 1959 .

[41]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[42]  Michael G. Madden,et al.  An Evolutionary Approach to Automatic Kernel Construction , 2006, ICANN.

[43]  M. Friedman The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance , 1937 .

[44]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[45]  Steven Reece,et al.  Sequential Bayesian Prediction in the Presence of Changepoints and Faults , 2010, Comput. J..

[46]  J. Shaffer Modified Sequentially Rejective Multiple Test Procedures , 1986 .

[47]  Marc Parizeau,et al.  DEAP: evolutionary algorithms made easy , 2012, J. Mach. Learn. Res..

[48]  Laura Diosan,et al.  Improving classification performance of Support Vector Machine by genetically optimising kernel shape and hyper-parameters , 2010, Applied Intelligence.

[49]  Laura Diosan,et al.  Evolving kernel functions for SVMs by genetic programming , 2007, Sixth International Conference on Machine Learning and Applications (ICMLA 2007).

[50]  Andrew Gordon Wilson,et al.  Gaussian Process Kernels for Pattern Discovery and Extrapolation , 2013, ICML.

[51]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[52]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[53]  David B. Dunson,et al.  Multiresolution Gaussian Processes , 2012, NIPS.

[54]  William Cohen Machine Learning for Information Management: Some Promising Directions , 2007, ICMLA 2007.

[55]  Umesh V. Vazirani,et al.  "Go with the winners" algorithms , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.