论文信息 - Near-linear Time Gaussian Process Optimization with Adaptive Batching and Resparsification - 字舞流文

Near-linear Time Gaussian Process Optimization with Adaptive Batching and Resparsification

Gaussian processes (GP) are one of the most successful frameworks to model uncertainty. However, GP optimization (e.g., GP-UCB) suffers from major scalability issues. Experimental time grows linearly with the number of evaluations, unless candidates are selected in batches (e.g., using GP-BUCB) and evaluated in parallel. Furthermore, computational cost is often prohibitive since algorithms such as GP-BUCB require a time at least quadratic in the number of dimensions and iterations to select each batch. In this paper, we introduce BBKB (Batch Budgeted Kernel Bandits), the first no-regret GP optimization algorithm that provably runs in near-linear time and selects candidates in batches. This is obtained with a new guarantee for the tracking of the posterior variances that allows BBKB to choose increasingly larger batches, improving over GP-BUCB. Moreover, we show that the same bound can be used to adaptively delay costly updates to the sparse GP approximation used by BBKB, achieving a near-constant per-step amortized cost. These findings are then confirmed in several experiments, where BBKB is much faster than state-of-the-art methods.

Daniele Calandriello | Lorenzo Rosasco | Luigi Carratino | Alessandro Lazaric | Michal Valko | A. Lazaric | L. Rosasco | Michal Valko | Daniele Calandriello | Luigi Carratino

[1] Trevor Campbell,et al. Scalable Gaussian Process Inference with Finite-data Mean and Variance Guarantees , 2018, AISTATS.

[2] Michael W. Mahoney,et al. Fast Randomized Kernel Methods With Statistical Guarantees , 2014, ArXiv.

[3] David P. Woodruff. Sketching as a Tool for Numerical Linear Algebra , 2014, Found. Trends Theor. Comput. Sci..

[4] Daniele Calandriello,et al. Gaussian Process Optimization with Adaptive Sketching: Scalable and No Regret , 2019, COLT.

[5] Zoubin Ghahramani,et al. Parallel Predictive Entropy Search for Batch Global Optimization of Expensive Objective Functions , 2015, NIPS.

[6] Nello Cristianini,et al. Finite-Time Analysis of Kernelised Contextual Bandits , 2013, UAI.

[7] Nando de Freitas,et al. Bayesian Multi-Scale Optimistic Optimization , 2014, AISTATS.

[8] Andreas Krause,et al. Parallelizing Exploration-Exploitation Tradeoffs with Gaussian Process Bandit Optimization , 2012, ICML.

[9] Kian Hsiang Low,et al. Distributed Batch Gaussian Process Optimization , 2017, ICML.

[10] Philipp Hennig,et al. Entropy Search for Information-Efficient Global Optimization , 2011, J. Mach. Learn. Res..

[11] Pushmeet Kohli,et al. Batched Gaussian Process Bandit Optimization via Determinantal Point Processes , 2016, NIPS.

[12] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[13] Andrew Gordon Wilson,et al. BoTorch: Programmable Bayesian Optimization in PyTorch , 2019, ArXiv.

[14] Aditya Gopalan,et al. On Kernelized Multi-armed Bandits , 2017, ICML.

[15] Nicolas Vayatis,et al. Parallel Gaussian Process Optimization with Upper Confidence Bound and Pure Exploration , 2013, ECML/PKDD.

[16] Thomas G. Dietterich. Adaptive computation and machine learning , 1998 .

[17] Donald R. Jones,et al. Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[18] Neil D. Lawrence,et al. Fast Forward Selection to Speed Up Sparse Gaussian Process Regression , 2003, AISTATS.

[19] Leslie Pack Kaelbling,et al. Bayesian Optimization with Exponential Convergence , 2015, NIPS.

[20] Andreas Krause,et al. Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[21] Christopher K. I. Williams,et al. Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[22] Zi Wang,et al. Batched Large-scale Bayesian Optimization in High-dimensional Spaces , 2017, AISTATS.

[23] Kirthevasan Kandasamy,et al. Parallelised Bayesian Optimisation via Thompson Sampling , 2018, AISTATS.

[24] Andreas Krause,et al. Efficient High Dimensional Bayesian Optimization with Additivity and Quadrature Fourier Features , 2018, NeurIPS.

[25] Krzysztof Choromanski,et al. From Complexity to Simplicity: Adaptive ES-Active Subspaces for Blackbox Optimization , 2019, NeurIPS.

[26] Daniele Calandriello,et al. Second-Order Kernel Online Convex Optimization with Adaptive Sketching , 2017, ICML.

[27] Alok Aggarwal,et al. Regularized Evolution for Image Classifier Architecture Search , 2018, AAAI.

[28] Andreas Krause,et al. Adaptive and Safe Bayesian Optimization in High Dimensions via One-Dimensional Subspaces , 2019, ICML.

[29] Daniele Calandriello,et al. Distributed Adaptive Sampling for Kernel Matrix Approximation , 2017, AISTATS.

[30] J. Weston,et al. Approximation Methods for Gaussian Process Regression , 2007 .

[31] David P. Woodru. Sketching as a Tool for Numerical Linear Algebra , 2014 .

[32] Warren B. Powell,et al. The Knowledge Gradient Algorithm for a General Class of Online Learning Problems , 2012, Oper. Res..

[33] Daniele Calandriello,et al. On Fast Leverage Score Sampling and Optimal Learning , 2018, NeurIPS.

[34] Andrew Gordon Wilson,et al. Kernel Interpolation for Scalable Structured Gaussian Processes (KISS-GP) , 2015, ICML.

[35] Aaron Klein,et al. NAS-Bench-101: Towards Reproducible Neural Architecture Search , 2019, ICML.

[36] David Ginsbourger,et al. Fast Computation of the Multi-Points Expected Improvement with Applications in Batch Selection , 2013, LION.