论文信息 - On Information Gain and Regret Bounds in Gaussian Process Bandits - 字舞流文

On Information Gain and Regret Bounds in Gaussian Process Bandits

Consider the sequential optimization of an expensive to evaluate and possibly non-convex objective function $f$ from noisy feedback, that can be considered as a continuum-armed bandit problem. Upper bounds on the regret performance of several learning algorithms (GP-UCB, GP-TS, and their variants) are known under both a Bayesian (when $f$ is a sample from a Gaussian process (GP)) and a frequentist (when $f$ lives in a reproducing kernel Hilbert space) setting. The regret bounds often rely on the maximal information gain $\gamma_T$ between $T$ observations and the underlying GP (surrogate) model. We provide general bounds on $\gamma_T$ based on the decay rate of the eigenvalues of the GP kernel, whose specialisation for commonly used kernels, improves the existing bounds on $\gamma_T$, and consequently the regret bounds relying on $\gamma_T$ under numerous settings. For the Matern family of kernels, where the lower bounds on $\gamma_T$, and regret under the frequentist setting, are known, our results close a huge polynomial in $T$ gap between the upper and lower bounds (up to logarithmic in $T$ factors).

Sattar Vakili | Kia Khezeli | Victor Picheny | V. Picheny | K. Khezeli | Sattar Vakili

[1] Kirthevasan Kandasamy,et al. Parallelised Bayesian Optimisation via Thompson Sampling , 2018, AISTATS.

[2] Shipra Agrawal,et al. Thompson Sampling for Contextual Bandits with Linear Payoffs , 2012, ICML.

[3] Aki Vehtari,et al. Practical Hilbert space approximate Bayesian Gaussian processes for probabilistic programming , 2020 .

[4] Jasper Snoek,et al. Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[5] Thomas P. Hayes,et al. Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.

[6] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[7] Nello Cristianini,et al. Finite-Time Analysis of Kernelised Contextual Bandits , 2013, UAI.

[8] Nando de Freitas,et al. Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[9] Qing Zhao,et al. Multi-Armed Bandits: Theory and Applications to Online Learning in Networks , 2019, Multi-Armed Bandits.

[10] Sattar Vakili,et al. Ordinal Bayesian Optimisation , 2019, ArXiv.

[11] Andreas Krause,et al. Contextual Gaussian Process Bandit Optimization , 2011, NIPS.

[12] Tara Javidi,et al. Gaussian Process bandits with adaptive discretization , 2017, ArXiv.

[13] Andreas Krause,et al. Efficient High Dimensional Bayesian Optimization with Additivity and Quadrature Fourier Features , 2018, NeurIPS.

[14] Jonathan Scarlett,et al. Tight Regret Bounds for Bayesian Optimization in One Dimension , 2018, ICML.

[15] Andreas Krause,et al. Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[17] Benjamin Van Roy,et al. Learning to Optimize via Posterior Sampling , 2013, Math. Oper. Res..

[18] C. Pozrikidis,et al. An Introduction to Grids, Graphs, and Networks , 2014 .

[19] Alessandro Lazaric,et al. Linear Thompson Sampling Revisited , 2016, AISTATS.

[20] Andreas Krause,et al. Corruption-Tolerant Gaussian Process Bandit Optimization , 2020, AISTATS.

[21] Maurice Queyranne,et al. An Exact Algorithm for Maximum Entropy Sampling , 1995, Oper. Res..

[22] Aleksandrs Slivkins,et al. Introduction to Multi-Armed Bandits , 2019, Found. Trends Mach. Learn..

[23] Kirthevasan Kandasamy,et al. Multi-fidelity Gaussian Process Bandit Optimisation , 2016, J. Artif. Intell. Res..

[24] Joel W. Burdick,et al. Stagewise Safe Bayesian Optimization with Gaussian Processes , 2018, ICML.

[25] Akshay Krishnamurthy,et al. Information Theoretic Regret Bounds for Online Nonlinear Control , 2020, NeurIPS.

[26] Andreas Krause,et al. Bayesian optimization with safety constraints: safe and automatic parameter tuning in robotics , 2016, Machine Learning.

[27] Daniele Calandriello,et al. Gaussian Process Optimization with Adaptive Sketching: Scalable and No Regret , 2019, COLT.

[28] Peter L. Bartlett,et al. Online learning with kernel losses , 2018, ICML.

[29] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[30] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[31] Carl E. Rasmussen,et al. Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[32] K. Loparo,et al. Inequalities for the trace of matrix product , 1994, IEEE Trans. Autom. Control..

[33] Thomas M. Cover,et al. Elements of Information Theory , 2005 .

[34] Leslie Pack Kaelbling,et al. Regret bounds for meta Bayesian optimization with an unknown Gaussian process prior , 2018, NeurIPS.

[35] Aditya Gopalan,et al. On Kernelized Multi-armed Bandits , 2017, ICML.

[36] Dino Sejdinovic,et al. Gaussian Processes and Kernel Methods: A Review on Connections and Equivalences , 2018, ArXiv.

[37] Benjamin Van Roy,et al. An Information-Theoretic Analysis of Thompson Sampling , 2014, J. Mach. Learn. Res..

[38] Andreas Krause,et al. High-Dimensional Gaussian Process Bandits , 2013, NIPS.

[39] Volkan Cevher,et al. Lower Bounds on Regret for Noisy Gaussian Process Bandit Optimization , 2017, COLT.

[40] John N. Tsitsiklis,et al. Linearly Parameterized Bandits , 2008, Math. Oper. Res..

[41] Mikhail Belkin,et al. Approximation beats concentration? An approximation view on inference with smooth radial kernels , 2018, COLT.

[42] Sham M. Kakade,et al. Information Consistency of Nonparametric Gaussian Process Methods , 2008, IEEE Transactions on Information Theory.

[43] David Janz,et al. Bandit optimisation of functions in the Matérn kernel RKHS , 2020, AISTATS.

[44] Jonathan Scarlett,et al. On Lower Bounds for Standard and Robust Gaussian Process Bandit Optimization , 2021, ICML.

[45] Robert Schaback,et al. Approximation of eigenfunctions in kernel-based spaces , 2014, Adv. Comput. Math..