论文信息 - Multi-fidelity Gaussian Process Bandit Optimisation

Multi-fidelity Gaussian Process Bandit Optimisation

In many scientific and engineering applications, we are tasked with the maximisation of an expensive to evaluate black box function f. Traditional settings for this problem assume just the availability of this single function. However, in many cases, cheap approximations to f may be obtainable. For example, the expensive real world behaviour of a robot can be approximated by a cheap computer simulation. We can use these approximations to eliminate low function value regions cheaply and use the expensive evaluations of f in a small but promising region and speedily identify the optimum. We formalise this task as a multi-fidelity bandit problem where the target function and its approximations are sampled from a Gaussian process. We develop MF-GP-UCB, a novel method based on upper confidence bound techniques. In our theoretical analysis we demonstrate that it exhibits precisely the above behaviour and achieves better bounds on the regret than strategies which ignore multi-fidelity information. Empirically, MF-GP-UCB outperforms such naive strategies and other multi-fidelity methods on several synthetic and real experiments.

[1] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[2] C. D. Gelatt,et al. Optimization by Simulated Annealing , 1983, Science.

[3] R. Adler. An introduction to continuity, extrema, and related topics for general Gaussian processes , 1990 .

[4] John R. Koza,et al. Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[5] C. D. Perttunen,et al. Lipschitzian optimization without the Lipschitz constant , 1993 .

[6] Jonas Mockus,et al. Application of Bayesian approach to numerical methods of global and stochastic optimization , 1994, J. Glob. Optim..

[7] R. Agrawal. Sample mean based index policies by O(log n) regret for the multi-armed bandit problem , 1995, Advances in Applied Probability.

[8] Vladimir Vapnik,et al. Statistical learning theory , 1998 .

[9] Donald R. Jones,et al. Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[10] A. O'Hagan,et al. Predicting the output from a complex computer code when fast approximations are available , 2000 .

[11] Paul A. Viola,et al. Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[12] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[13] Pieter Abbeel,et al. Using inaccurate models in reinforcement learning , 2006, ICML.

[14] S. Ghosal,et al. Posterior consistency of Gaussian process prior for nonparametric binary regression , 2006, math/0702686.

[15] D. Parkinson,et al. Bayesian model selection analysis of WMAP3 , 2006, astro-ph/0605003.

[16] R. A. Miller,et al. Sequential kriging optimization using multiple-fidelity evaluations , 2006 .

[17] Gregory S. Hornby,et al. Automated Antenna Design with Evolutionary Algorithms , 2006 .

[18] Alexander I. J. Forrester,et al. Multi-fidelity optimization via surrogate modelling , 2007, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[19] Nando de Freitas,et al. Active Policy Learning for Robot Planning and Exploration under Uncertainty , 2007, Robotics: Science and Systems.

[20] W. M. Wood-Vasey,et al. Scrutinizing Exotic Cosmological Models Using ESSENCE Supernova Data Combined with Other Cosmological Probes , 2007, astro-ph/0701510.

[21] H. Robbins. Some aspects of the sequential design of experiments , 1952 .

[22] Tao Wang,et al. Automatic Gait Optimization with Gaussian Process Regression , 2007, IJCAI.

[23] Filip Radlinski,et al. Mortal Multi-Armed Bandits , 2008, NIPS.