Optimal Order Simple Regret for Gaussian Process Bandits

Consider the sequential optimization of a continuous, possibly non-convex, and expensive to evaluate objective function f . The problem can be cast as a Gaussian Process (GP) bandit where f lives in a reproducing kernel Hilbert space (RKHS). The state of the art analysis of several learning algorithms shows a significant gap between the lower and upper bounds on the simple regret performance. When N is the number of exploration trials and γN is the maximal information gain, we prove an Õ( √ γN/N) bound on the simple regret performance of a pure exploration algorithm that is significantly tighter than the existing bounds. We show that this bound is order optimal up to logarithmic factors for the cases where a lower bound on regret is known. To establish these results, we prove novel and sharp confidence intervals for GP models applicable to RKHS elements which may be of broader interest.

[1]  Ali Jalali,et al.  Hybrid Batch Bayesian Optimization , 2012, ICML.

[2]  Tara Javidi,et al.  Open Problem: Tight Online Confidence Intervals for RKHS Elements , 2021, COLT.

[3]  Leslie Pack Kaelbling,et al.  Regret bounds for meta Bayesian optimization with an unknown Gaussian process prior , 2018, NeurIPS.

[4]  Michal Valko,et al.  Simple regret for infinitely many armed bandits , 2015, ICML.

[5]  Nando de Freitas,et al.  Portfolio Allocation for Bayesian Optimization , 2010, UAI.

[6]  Patrick Chareka,et al.  LOCALLY SUB-GAUSSIAN RANDOM VARIABLES AND THE STRONG LAW OF LARGE NUMBERS , 2006 .

[7]  Quanquan Gu,et al.  Neural Contextual Bandits with UCB-based Exploration , 2019, ICML.

[8]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[9]  Ness B. Shroff,et al.  Multi-Armed Bandits with Local Differential Privacy , 2020, ArXiv.

[10]  Aditya Gopalan,et al.  On Kernelized Multi-armed Bandits , 2017, ICML.

[11]  Neil D. Lawrence,et al.  Gaussian Processes for Big Data , 2013, UAI.

[12]  Tara Javidi,et al.  Multi-Scale Zero-Order Optimization of Smooth Functions in an RKHS , 2020, ArXiv.

[13]  Nicolò Cesa-Bianchi,et al.  Bandits With Heavy Tail , 2012, IEEE Transactions on Information Theory.

[14]  R. G. Antonini,et al.  Convergence of series of dependent φ-subgaussian random variables , 2008 .

[15]  Christos Dimitrakakis,et al.  Differential Privacy for Multi-armed Bandits: What Is It and What Is Its Cost? , 2019, ArXiv.

[16]  Qing Zhao,et al.  Multi-Armed Bandits: Theory and Applications to Online Learning in Networks , 2019, Multi-Armed Bandits.

[17]  Leslie Pack Kaelbling,et al.  Bayesian Optimization with Exponential Convergence , 2015, NIPS.

[18]  Tara Javidi,et al.  Significance of Gradient Information in Bayesian Optimization , 2021, AISTATS.

[19]  Alexander J. Smola,et al.  Exponential Regret Bounds for Gaussian Process Bandits with Deterministic Observations , 2012, ICML.

[20]  David Janz,et al.  Bandit optimisation of functions in the Matérn kernel RKHS , 2020, AISTATS.

[21]  Jonathan Scarlett,et al.  On Lower Bounds for Standard and Robust Gaussian Process Bandit Optimization , 2021, ICML.

[22]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[23]  Dino Sejdinovic,et al.  Gaussian Processes and Kernel Methods: A Review on Connections and Equivalences , 2018, ArXiv.

[24]  Kirthevasan Kandasamy,et al.  Parallelised Bayesian Optimisation via Thompson Sampling , 2018, AISTATS.

[25]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[26]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[27]  John Shawe-Taylor,et al.  Regret Bounds for Gaussian Process Bandit Problems , 2010, AISTATS 2010.

[28]  Nello Cristianini,et al.  Finite-Time Analysis of Kernelised Contextual Bandits , 2013, UAI.

[29]  Aaron Klein,et al.  BOHB: Robust and Efficient Hyperparameter Optimization at Scale , 2018, ICML.

[30]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[31]  Michael I. Jordan,et al.  On Thompson Sampling with Langevin Algorithms , 2020, ICML 2020.

[32]  Andreas Krause,et al.  Truncated Variance Reduction: A Unified Approach to Bayesian Optimization and Level-Set Estimation , 2016, NIPS.

[33]  Tara Javidi,et al.  Gaussian Process bandits with adaptive discretization , 2017, ArXiv.

[34]  R. Agrawal The Continuum-Armed Bandit Problem , 1995 .

[35]  Robert D. Kleinberg Nearly Tight Bounds for the Continuum-Armed Bandit Problem , 2004, NIPS.

[36]  Zi Wang,et al.  Max-value Entropy Search for Efficient Bayesian Optimization , 2017, ICML.

[37]  D. Ginsbourger,et al.  A benchmark of kriging-based infill criteria for noisy optimization , 2013, Structural and Multidisciplinary Optimization.

[38]  Andreas Krause,et al.  Contextual Gaussian Process Bandit Optimization , 2011, NIPS.

[39]  Adam D. Bull,et al.  Convergence Rates of Efficient Global Optimization Algorithms , 2011, J. Mach. Learn. Res..

[40]  Michalis K. Titsias,et al.  Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.

[41]  Sattar Vakili,et al.  A Random Walk Approach to First-Order Stochastic Convex Optimization , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).

[42]  P. Frazier Bayesian Optimization , 2018, Hyperparameter Optimization in Machine Learning.

[43]  Eli Upfal,et al.  Multi-Armed Bandits in Metric Spaces ∗ , 2008 .

[44]  Richard Combes,et al.  Unimodal Bandits with Continuous Arms: Order-optimal Regret without Smoothness , 2020, Abstracts of the 2020 SIGMETRICS/Performance Joint International Conference on Measurement and Modeling of Computer Systems.

[45]  Csaba Szepesvári,et al.  –armed Bandits , 2022 .

[46]  Andreas Krause,et al.  Corruption-Tolerant Gaussian Process Bandit Optimization , 2020, AISTATS.

[47]  Sattar Vakili,et al.  Ordinal Bayesian Optimisation , 2019, ArXiv.

[48]  Sham M. Kakade,et al.  Stochastic Convex Optimization with Bandit Feedback , 2011, SIAM J. Optim..

[49]  Jonathan Scarlett,et al.  Tight Regret Bounds for Bayesian Optimization in One Dimension , 2018, ICML.

[50]  Stefano Ermon,et al.  Best arm identification in multi-armed bandits with delayed feedback , 2018, AISTATS.

[51]  Nando de Freitas,et al.  Theoretical Analysis of Bayesian Optimisation with Unknown Gaussian Process Hyper-Parameters , 2014, ArXiv.

[52]  Kai Zheng,et al.  Locally Differentially Private (Contextual) Bandits Learning , 2020, NeurIPS.

[53]  Robert T. McGibbon,et al.  Osprey: Hyperparameter Optimization for Machine Learning , 2016, J. Open Source Softw..

[54]  Aleksandrs Slivkins,et al.  Introduction to Multi-Armed Bandits , 2019, Found. Trends Mach. Learn..

[55]  Kirthevasan Kandasamy,et al.  Multi-fidelity Gaussian Process Bandit Optimisation , 2016, J. Artif. Intell. Res..

[56]  Joel W. Burdick,et al.  Stagewise Safe Bayesian Optimization with Gaussian Processes , 2018, ICML.

[57]  Dominik D. Freydenberger,et al.  Can We Learn to Gamble Efficiently? , 2010, COLT.

[58]  Sattar Vakili,et al.  Deterministic Sequencing of Exploration and Exploitation for Multi-Armed Bandit Problems , 2011, IEEE Journal of Selected Topics in Signal Processing.

[59]  Csaba Szepesvári,et al.  Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[60]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[61]  Sattar Vakili,et al.  On Information Gain and Regret Bounds in Gaussian Process Bandits , 2020, AISTATS.

[62]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[63]  Sattar Vakili,et al.  Regret Bounds for Noise-Free Bayesian Optimization , 2020, ArXiv.

[64]  Clayton Scott,et al.  Simple Regret Minimization for Contextual Bandits , 2018, ArXiv.

[66]  Andreas Krause,et al.  Bayesian optimization with safety constraints: safe and automatic parameter tuning in robotics , 2016, Machine Learning.

[67]  Daniele Calandriello,et al.  Gaussian Process Optimization with Adaptive Sketching: Scalable and No Regret , 2019, COLT.

[68]  Rémi Munos,et al.  Pure exploration in finitely-armed and continuous-armed bandits , 2011, Theor. Comput. Sci..

[69]  T. Lai,et al.  Self-Normalized Processes: Limit Theory and Statistical Applications , 2001 .

[70]  A. Teckentrup Convergence of Gaussian Process Regression with Estimated Hyper-parameters and Applications in Bayesian Inverse Problems , 2019, SIAM/ASA J. Uncertain. Quantification.

[71]  Andreas Krause,et al.  Efficient High Dimensional Bayesian Optimization with Additivity and Quadrature Fourier Features , 2018, NeurIPS.

[72]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[73]  Nicolas Vayatis,et al.  Parallel Gaussian Process Optimization with Upper Confidence Bound and Pure Exploration , 2013, ECML/PKDD.

[74]  Cheng Li,et al.  Regret for Expected Improvement over the Best-Observed Value and Stopping Condition , 2017, ACML.

[75]  Andreas Krause,et al.  High-Dimensional Gaussian Process Bandits , 2013, NIPS.

[76]  Volkan Cevher,et al.  Lower Bounds on Regret for Noisy Gaussian Process Bandit Optimization , 2017, COLT.