论文信息 - Parallelizing Exploration-Exploitation Tradeoffs with Gaussian Process Bandit Optimization

Parallelizing Exploration-Exploitation Tradeoffs with Gaussian Process Bandit Optimization

Can one parallelize complex exploration-exploitation tradeoffs? As an example, consider the problem of optimal high-throughput experimental design, where we wish to sequentially design batches of experiments in order to simultaneously learn a surrogate function mapping stimulus to response and identify the maximum of the function. We formalize the task as a multiarmed bandit problem, where the unknown payoff function is sampled from a Gaussian process (GP), and instead of a single arm, in each round we pull a batch of several arms in parallel. We develop GP-BUCB, a principled algorithm for choosing batches, based on the GP-UCB algorithm for sequential GP optimization. We prove a surprising result; as compared to the sequential approach, the cumulative regret of the parallel algorithm only increases by a constant factor independent of the batch size B. Our results provide rigorous theoretical support for exploiting parallelism in Bayesian global optimization. We demonstrate the effectiveness of our approach on two real-world applications.

[1] Michel Minoux,et al. Accelerated greedy algorithms for maximizing submodular set functions , 1978 .

[2] J. Gittins. Bandit processes and dynamic allocation indices , 1979 .

[3] J. Mockus,et al. The Bayesian approach to global optimization , 1989 .

[4] J. Mockus. Bayesian Approach to Global Optimization: Theory and Applications , 1989 .

[5] Thomas M. Cover,et al. Elements of Information Theory , 2005 .

[6] D. Dennis,et al. A statistical method for global optimization , 1992, [Proceedings] 1992 IEEE International Conference on Systems, Man, and Cybernetics.

[7] D. Dennis,et al. SDO : A Statistical Method for Global Optimization , 1997 .

[8] Donald R. Jones,et al. Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[9] Alexander J. Smola,et al. Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[10] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[11] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[12] Andreas Krause,et al. Near-optimal Nonmyopic Value of Information in Graphical Models , 2005, UAI.

[13] S. Ghosal,et al. Posterior consistency of Gaussian process prior for nonparametric binary regression , 2006, math/0702686.

[14] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.

[15] H. Robbins. Some aspects of the sequential design of experiments , 1952 .

[16] Tao Wang,et al. Automatic Gait Optimization with Gaussian Process Regression , 2007, IJCAI.

[17] Thomas P. Hayes,et al. Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.

[18] Eli Upfal,et al. Multi-Armed Bandits in Metric Spaces ∗ , 2008 .