Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting

Many applications require optimizing an unknown, noisy function that is expensive to evaluate. We formalize this task as a multi-armed bandit problem, where the payoff function is either sampled from a Gaussian process (GP) or has low RKHS norm. We resolve the important open problem of deriving regret bounds for this setting, which imply novel convergence rates for GP optimization. We analyze GP-UCB, an intuitive upper-confidence based algorithm, and bound its cumulative regret in terms of maximal information gain, establishing a novel connection between GP optimization and experimental design. Moreover, by bounding the latter in terms of operator spectra, we obtain explicit sublinear regret bounds for many commonly used covariance functions. In some important cases, our bounds have surprisingly weak dependence on the dimensionality. In our experiments on real sensor data, GP-UCB compares favorably with other heuristical GP optimization approaches.

[1]  D. Freedman On Tail Probabilities for Martingales , 1975 .

[2]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[3]  Jonas Mockus,et al.  Bayesian Approach to Global Optimization , 1989 .

[4]  Grace Wahba,et al.  Spline Models for Observational Data , 1990 .

[5]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[6]  K. Chaloner,et al.  Bayesian Experimental Design: A Review , 1995 .

[7]  Maurice Queyranne,et al.  An Exact Algorithm for Maximum Entropy Sampling , 1995, Oper. Res..

[8]  Donald R. Jones,et al.  Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[9]  Roger Woodard,et al.  Interpolation of Spatial Data: Some Theory for Kriging , 1999, Technometrics.

[10]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[11]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[12]  Nello Cristianini,et al.  On the eigenspectrum of the gram matrix and the generalization error of kernel-PCA , 2005, IEEE Transactions on Information Theory.

[13]  Andreas Krause,et al.  Near-optimal Nonmyopic Value of Information in Graphical Models , 2005, UAI.

[14]  N. Zheng,et al.  Global Optimization of Stochastic Black-Box Systems via Sequential Kriging Meta-Models , 2006, J. Glob. Optim..

[15]  S. Ghosal,et al.  Posterior consistency of Gaussian process prior for nonparametric binary regression , 2006, math/0702686.

[16]  Sandeep Pandey,et al.  Handling Advertisements of Unknown Quality in Search Advertising , 2006, NIPS.

[17]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[18]  E. Vázquez,et al.  Convergence properties of the expected improvement algorithm , 2007, 0712.3744.

[19]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[20]  Tao Wang,et al.  Automatic Gait Optimization with Gaussian Process Regression , 2007, IJCAI.

[21]  Thomas P. Hayes,et al.  The Price of Bandit Information for Online Optimization , 2007, NIPS.

[22]  Thomas P. Hayes,et al.  Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.

[23]  Eli Upfal,et al.  Multi-Armed Bandits in Metric Spaces ∗ , 2008 .

[24]  Csaba Szepesvári,et al.  Online Optimization in X-Armed Bandits , 2008, NIPS.

[25]  Sham M. Kakade,et al.  Information Consistency of Nonparametric Gaussian Process Methods , 2008, IEEE Transactions on Information Theory.

[26]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[27]  Dorota Glowacka,et al.  GAUSSIAN PROCESS MODELLING OF DEPENDENCIES IN MULTI-ARMED BANDIT PROBLEMS , 2009 .

[28]  John N. Tsitsiklis,et al.  Linearly Parameterized Bandits , 2008, Math. Oper. Res..

[29]  John Shawe-Taylor,et al.  Regret Bounds for Gaussian Process Bandit Problems , 2010, AISTATS 2010.

[30]  Nando de Freitas,et al.  A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.