Ada-BKB: Scalable Gaussian Process Optimization on Continuous Domain by Adaptive Discretization

Gaussian process optimization is a successful class of algorithms(e.g. GP-UCB) to optimize a blackbox function through sequential evaluations. However, for functions with continuous domains, Gaussian process optimization has to rely on either a fixed discretization of the space, or the solution of a non-convex optimization subproblem at each evaluation. The first approach can negatively affect performance, while the second approach requires a heavy computational burden. A third option, only recently theoretically studied, is to adaptively discretize the function domain. Even though this approach avoids the extra nonconvex optimization costs, the overall computational complexity is still prohibitive. An algorithm such as GP-UCB has a runtime of O(T ), where T is the number of iterations. In this paper, we introduce AdaBKB (Adaptive Budgeted Kernelized Bandit), a no-regret Gaussian process optimization algorithm for functions on continuous domains, that provably runs in O(T deff), where deff is the effective dimension of the explored space, and which is typically much smaller than T . We corroborate our theoretical findings with experiments on synthetic non-convex functions and on the real-world problem of hyper-parameter optimization, confirming the good practical performances of the proposed approach.

[1]  Darrell Whitley,et al.  A genetic algorithm tutorial , 1994, Statistics and Computing.

[2]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[3]  Petros Drineas,et al.  On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning , 2005, J. Mach. Learn. Res..

[4]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[5]  Eli Upfal,et al.  Multi-Armed Bandits in Metric Spaces ∗ , 2008 .

[6]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[7]  Michalis K. Titsias,et al.  Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.

[8]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[9]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[10]  Csaba Szepesvári,et al.  –armed Bandits , 2022 .

[11]  Rémi Munos,et al.  Optimistic Optimization of Deterministic Functions , 2011, NIPS 2011.

[12]  S. Kakade,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2012, IEEE Transactions on Information Theory.

[13]  Rémi Munos,et al.  Stochastic Simultaneous Optimistic Optimization , 2013, ICML.

[14]  Gilles Louppe,et al.  Independent consultant , 2013 .

[15]  Nello Cristianini,et al.  Finite-Time Analysis of Kernelised Contextual Bandits , 2013, UAI.

[16]  Nando de Freitas,et al.  Bayesian Multi-Scale Optimistic Optimization , 2014, AISTATS.

[17]  S. Kung Kernel Methods and Machine Learning , 2014 .

[18]  Rémi Munos,et al.  From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning , 2014, Found. Trends Mach. Learn..

[19]  James Hensman,et al.  Scalable Variational Gaussian Process Classification , 2014, AISTATS.

[20]  Leslie Pack Kaelbling,et al.  Bayesian Optimization with Exponential Convergence , 2015, NIPS.

[21]  Joshua D. Knowles,et al.  Fifty years of pulsar candidate selection: from simple filters to a new principled real-time classification approach , 2016, Monthly Notices of the Royal Astronomical Society.

[22]  Diego Klabjan,et al.  Improving the Expected Improvement Algorithm , 2017, NIPS.

[23]  Yurii Nesterov,et al.  Random Gradient-Free Minimization of Convex Functions , 2015, Foundations of Computational Mathematics.

[24]  Lorenzo Rosasco,et al.  FALKON: An Optimal Large Scale Kernel Method , 2017, NIPS.

[25]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[26]  Tara Javidi,et al.  Gaussian Process bandits with adaptive discretization , 2017, ArXiv.

[27]  Andreas Krause,et al.  Efficient High Dimensional Bayesian Optimization with Additivity and Quadrature Fourier Features , 2018, NeurIPS.

[28]  Andrew Gordon Wilson,et al.  GPyTorch: Blackbox Matrix-Matrix Gaussian Process Inference with GPU Acceleration , 2018, NeurIPS.

[29]  Eli Upfal,et al.  Bandits and Experts in Metric Spaces , 2013, J. ACM.

[30]  Carl E. Rasmussen,et al.  Rates of Convergence for Sparse Variational Gaussian Process Regression , 2019, ICML.

[31]  Daniele Calandriello,et al.  Gaussian Process Optimization with Adaptive Sketching: Scalable and No Regret , 2019, COLT.

[32]  Jaime Fern'andez del R'io,et al.  Array programming with NumPy , 2020, Nature.

[33]  Tara Javidi,et al.  Multi-Scale Zero-Order Optimization of Smooth Functions in an RKHS , 2020, ArXiv.

[34]  Csaba Szepesvari,et al.  Bandit Algorithms , 2020 .

[35]  Lorenzo Rosasco,et al.  Kernel methods through the roof: handling billions of points efficiently , 2020, NeurIPS.

[36]  Daniele Calandriello,et al.  Near-linear Time Gaussian Process Optimization with Adaptive Batching and Resparsification , 2020, ICML.

[37]  Sattar Vakili,et al.  A Computationally Efficient Approach to Black-box Optimization using Gaussian Process Models , 2020, ArXiv.

[38]  Connections and Equivalences between the Nyström Method and Sparse Variational Gaussian Processes , 2021, ArXiv.

[39]  Scaling Gaussian Process Optimization by Evaluating a Few Unique Candidates Multiple Times , 2022, ArXiv.