论文信息 - Efficient High Dimensional Bayesian Optimization with Additivity and Quadrature Fourier Features

Efficient High Dimensional Bayesian Optimization with Additivity and Quadrature Fourier Features

We develop an efficient and provably no-regret Bayesian optimization (BO) algorithm for optimization of black-box functions in high dimensions. We assume a generalized additive model with possibly overlapping variable groups. When the groups do not overlap, we are able to provide the first provably no-regret \emph{polynomial time} (in the number of evaluations of the acquisition function) algorithm for solving high dimensional BO. To make the optimization efficient and feasible, we introduce a novel deterministic Fourier Features approximation based on numerical integration with detailed analysis for the squared exponential kernel. The error of this approximation decreases \emph{exponentially} with the number of features, and allows for a precise approximation of both posterior mean and variance. In addition, the kernel matrix inversion improves in its complexity from cubic to essentially linear in the number of data points measured in basic arithmetic operations.

Andreas Krause | Mojmir Mutny | Andreas Krause | Mojmír Mutný

[1] Neil D. Lawrence,et al. Gaussian Processes for Big Data , 2013, UAI.

[2] Zoltán Szabó,et al. Optimal Rates for Random Fourier Features , 2015, NIPS.

[3] Andrew Gordon Wilson,et al. Thoughts on Massively Scalable Gaussian Processes , 2015, ArXiv.

[4] Arno Solin,et al. Variational Fourier Features for Gaussian Processes , 2016, J. Mach. Learn. Res..

[5] Andreas Krause,et al. Contextual Gaussian Process Bandit Optimization , 2011, NIPS.

[6] Vikas Sindhwani,et al. Quasi-Monte Carlo Feature Maps for Shift-Invariant Kernels , 2014, J. Mach. Learn. Res..

[7] Stefano Ermon,et al. Sparse Gaussian Processes for Bayesian Optimization , 2016, UAI.

[8] Aaron Klein,et al. Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets , 2016, AISTATS.

[9] Andreas Krause,et al. Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[10] Csaba Szepesvari,et al. Online learning for linearly parametrized control problems , 2012 .

[11] Larry A. Wasserman,et al. SpAM: Sparse Additive Models , 2007, NIPS.

[12] R. Tibshirani,et al. Generalized Additive Models , 1986 .

[13] Kirthevasan Kandasamy,et al. High Dimensional Bayesian Optimisation and Bandits via Additive Models , 2015, ICML.

[14] Andrew Zisserman,et al. Efficient additive kernels via explicit feature maps , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15] Kai Diethelm,et al. Error Bounds for the Numerical Integration of Functions with Limited Smoothness , 2013, SIAM J. Numer. Anal..

[16] Carl E. Rasmussen,et al. Additive Gaussian Processes , 2011, NIPS.

[17] Prabhat,et al. Scalable Bayesian Optimization Using Deep Neural Networks , 2015, ICML.

[18] A. Rahimi,et al. Uniform approximation of functions with random bases , 2008, 2008 46th Annual Allerton Conference on Communication, Control, and Computing.

[19] Aditya Gopalan,et al. On Kernelized Multi-armed Bandits , 2017, ICML.

[20] Nando de Freitas,et al. Bayesian Optimization in a Billion Dimensions via Random Embeddings , 2013, J. Artif. Intell. Res..

[21] Alán Aspuru-Guzik,et al. Parallel and Distributed Thompson Sampling for Large-scale Accelerated Exploration of Chemical Space , 2017, ICML.

[22] Joachim M. Buhmann,et al. Correlated random features for fast semi-supervised learning , 2013, NIPS.

[23] Peter Richtárik,et al. Parallel Stochastic Newton Method , 2017, Journal of Computational Mathematics.

[24] Zi Wang,et al. Max-value Entropy Search for Efficient Bayesian Optimization , 2017, ICML.

[25] Philipp Hennig,et al. Entropy Search for Information-Efficient Global Optimization , 2011, J. Mach. Learn. Res..

[26] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[27] Arnold Neumaier,et al. Introduction to Numerical Analysis , 2001 .

[28] Steven L. Scott,et al. Multi-armed bandit experiments in the online service economy , 2015 .

[29] Bernhard Schölkopf,et al. A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[30] Matthias W. Seeger,et al. Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.