Near-linear Time Gaussian Process Optimization with Adaptive Batching and Resparsification

Gaussian processes (GP) are one of the most successful frameworks to model uncertainty. However, GP optimization (e.g., GP-UCB) suffers from major scalability issues. Experimental time grows linearly with the number of evaluations, unless candidates are selected in batches (e.g., using GP-BUCB) and evaluated in parallel. Furthermore, computational cost is often prohibitive since algorithms such as GP-BUCB require a time at least quadratic in the number of dimensions and iterations to select each batch. In this paper, we introduce BBKB (Batch Budgeted Kernel Bandits), the first no-regret GP optimization algorithm that provably runs in near-linear time and selects candidates in batches. This is obtained with a new guarantee for the tracking of the posterior variances that allows BBKB to choose increasingly larger batches, improving over GP-BUCB. Moreover, we show that the same bound can be used to adaptively delay costly updates to the sparse GP approximation used by BBKB, achieving a near-constant per-step amortized cost. These findings are then confirmed in several experiments, where BBKB is much faster than state-of-the-art methods.

[1]  Trevor Campbell,et al.  Scalable Gaussian Process Inference with Finite-data Mean and Variance Guarantees , 2018, AISTATS.

[2]  Michael W. Mahoney,et al.  Fast Randomized Kernel Methods With Statistical Guarantees , 2014, ArXiv.

[3]  David P. Woodruff Sketching as a Tool for Numerical Linear Algebra , 2014, Found. Trends Theor. Comput. Sci..

[4]  Daniele Calandriello,et al.  Gaussian Process Optimization with Adaptive Sketching: Scalable and No Regret , 2019, COLT.

[5]  Zoubin Ghahramani,et al.  Parallel Predictive Entropy Search for Batch Global Optimization of Expensive Objective Functions , 2015, NIPS.

[6]  Nello Cristianini,et al.  Finite-Time Analysis of Kernelised Contextual Bandits , 2013, UAI.

[7]  Nando de Freitas,et al.  Bayesian Multi-Scale Optimistic Optimization , 2014, AISTATS.

[8]  Andreas Krause,et al.  Parallelizing Exploration-Exploitation Tradeoffs with Gaussian Process Bandit Optimization , 2012, ICML.

[9]  Kian Hsiang Low,et al.  Distributed Batch Gaussian Process Optimization , 2017, ICML.

[10]  Philipp Hennig,et al.  Entropy Search for Information-Efficient Global Optimization , 2011, J. Mach. Learn. Res..

[11]  Pushmeet Kohli,et al.  Batched Gaussian Process Bandit Optimization via Determinantal Point Processes , 2016, NIPS.

[12]  Csaba Szepesvári,et al.  Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[13]  Andrew Gordon Wilson,et al.  BoTorch: Programmable Bayesian Optimization in PyTorch , 2019, ArXiv.

[14]  Aditya Gopalan,et al.  On Kernelized Multi-armed Bandits , 2017, ICML.

[15]  Nicolas Vayatis,et al.  Parallel Gaussian Process Optimization with Upper Confidence Bound and Pure Exploration , 2013, ECML/PKDD.

[16]  Thomas G. Dietterich Adaptive computation and machine learning , 1998 .

[17]  Donald R. Jones,et al.  Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[18]  Neil D. Lawrence,et al.  Fast Forward Selection to Speed Up Sparse Gaussian Process Regression , 2003, AISTATS.

[19]  Leslie Pack Kaelbling,et al.  Bayesian Optimization with Exponential Convergence , 2015, NIPS.

[20]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[21]  Christopher K. I. Williams,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[22]  Zi Wang,et al.  Batched Large-scale Bayesian Optimization in High-dimensional Spaces , 2017, AISTATS.

[23]  Kirthevasan Kandasamy,et al.  Parallelised Bayesian Optimisation via Thompson Sampling , 2018, AISTATS.

[24]  Andreas Krause,et al.  Efficient High Dimensional Bayesian Optimization with Additivity and Quadrature Fourier Features , 2018, NeurIPS.

[25]  Krzysztof Choromanski,et al.  From Complexity to Simplicity: Adaptive ES-Active Subspaces for Blackbox Optimization , 2019, NeurIPS.

[26]  Daniele Calandriello,et al.  Second-Order Kernel Online Convex Optimization with Adaptive Sketching , 2017, ICML.

[27]  Alok Aggarwal,et al.  Regularized Evolution for Image Classifier Architecture Search , 2018, AAAI.

[28]  Andreas Krause,et al.  Adaptive and Safe Bayesian Optimization in High Dimensions via One-Dimensional Subspaces , 2019, ICML.

[29]  Daniele Calandriello,et al.  Distributed Adaptive Sampling for Kernel Matrix Approximation , 2017, AISTATS.

[30]  J. Weston,et al.  Approximation Methods for Gaussian Process Regression , 2007 .

[31]  David P. Woodru Sketching as a Tool for Numerical Linear Algebra , 2014 .

[32]  Warren B. Powell,et al.  The Knowledge Gradient Algorithm for a General Class of Online Learning Problems , 2012, Oper. Res..

[33]  Daniele Calandriello,et al.  On Fast Leverage Score Sampling and Optimal Learning , 2018, NeurIPS.

[34]  Andrew Gordon Wilson,et al.  Kernel Interpolation for Scalable Structured Gaussian Processes (KISS-GP) , 2015, ICML.

[35]  Aaron Klein,et al.  NAS-Bench-101: Towards Reproducible Neural Architecture Search , 2019, ICML.

[36]  David Ginsbourger,et al.  Fast Computation of the Multi-Points Expected Improvement with Applications in Batch Selection , 2013, LION.