Gaussian Process bandits with adaptive discretization

In this paper, the problem of maximizing a black-box function $f:\mathcal{X} \to \mathbb{R}$ is studied in the Bayesian framework with a Gaussian Process (GP) prior. In particular, a new algorithm for this problem is proposed, and high probability bounds on its simple and cumulative regret are established. The query point selection rule in most existing methods involves an exhaustive search over an increasingly fine sequence of uniform discretizations of $\mathcal{X}$. The proposed algorithm, in contrast, adaptively refines $\mathcal{X}$ which leads to a lower computational complexity, particularly when $\mathcal{X}$ is a subset of a high dimensional Euclidean space. In addition to the computational gains, sufficient conditions are identified under which the regret bounds of the new algorithm improve upon the known results. Finally an extension of the algorithm to the case of contextual bandits is proposed, and high probability bounds on the contextual regret are presented.

[1]  Nando de Freitas,et al.  Bayesian Multi-Scale Optimistic Optimization , 2014, AISTATS.

[2]  Rémi Munos,et al.  Optimistic Optimization of Deterministic Functions , 2011, NIPS 2011.

[3]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[4]  Philipp Hennig,et al.  Entropy Search for Information-Efficient Global Optimization , 2011, J. Mach. Learn. Res..

[5]  Ramon van Handel,et al.  Chaining, Interpolation, and Convexity , 2015, 1508.05906.

[6]  Nicolas Vayatis,et al.  Parallel Gaussian Process Optimization with Upper Confidence Bound and Pure Exploration , 2013, ECML/PKDD.

[7]  Aleksandrs Slivkins,et al.  Contextual Bandits with Similarity Information , 2009, COLT.

[8]  Leslie Pack Kaelbling,et al.  Bayesian Optimization with Exponential Convergence , 2015, NIPS.

[9]  Kirthevasan Kandasamy,et al.  Multi-fidelity Gaussian Process Bandit Optimisation , 2016, J. Artif. Intell. Res..

[10]  Andreas Krause,et al.  Parallelizing Exploration-Exploitation Tradeoffs with Gaussian Process Bandit Optimization , 2012, ICML.

[11]  Volkan Cevher,et al.  Time-Varying Gaussian Process Bandit Optimization , 2016, AISTATS.

[12]  Andreas Krause,et al.  Truncated Variance Reduction: A Unified Approach to Bayesian Optimization and Level-Set Estimation , 2016, NIPS.

[13]  Andreas Krause,et al.  Contextual Gaussian Process Bandit Optimization , 2011, NIPS.

[14]  Bolei Zhou,et al.  Optimization as Estimation with Gaussian Processes in Bandit Settings , 2015, AISTATS.

[15]  Rémi Munos,et al.  From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning , 2014, Found. Trends Mach. Learn..

[16]  Rémi Munos,et al.  Stochastic Simultaneous Optimistic Optimization , 2013, ICML.

[17]  R. Handel Probability in High Dimension , 2014 .

[18]  David Duvenaud,et al.  Automatic model construction with Gaussian processes , 2014 .

[19]  Nicolas Vayatis,et al.  Stochastic Process Bandits: Upper Confidence Bounds Algorithms via Generic Chaining , 2016, ArXiv.

[20]  Rémi Munos,et al.  Pure exploration in finitely-armed and continuous-armed bandits , 2011, Theor. Comput. Sci..

[21]  Csaba Szepesvári,et al.  –armed Bandits , 2022 .

[22]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[23]  Adam D. Bull,et al.  Convergence Rates of Efficient Global Optimization Algorithms , 2011, J. Mach. Learn. Res..

[24]  Benjamin Van Roy,et al.  Learning to Optimize via Posterior Sampling , 2013, Math. Oper. Res..

[25]  Eli Upfal,et al.  Bandits and Experts in Metric Spaces , 2013, J. ACM.