A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning

We present a tutorial on Bayesian optimization, a method of finding the maximum of expensive cost functions. Bayesian optimization employs the Bayesian technique of setting a prior over the objective function and combining it with evidence to get a posterior function. This permits a utility-based selection of the next observation to make on the objective function, which must take into account both exploration (sampling from areas of high uncertainty) and exploitation (sampling areas likely to offer improvement over the current best observation). We also present two detailed extensions of Bayesian optimization, with experiments---active user modelling with preferences, and hierarchical reinforcement learning---and a discussion of the pros and cons of Bayesian optimization based on our experiences.

[1]  F. Mosteller Remarks on the method of paired comparisons: I. The least squares solution assuming equal standard deviations and equal correlations , 1951 .

[2]  D. Krige A statistical approach to some basic mine valuation problems on the Witwatersrand, by D.G. Krige, published in the Journal, December 1951 : introduction by the author , 1951 .

[3]  F. Mosteller,et al.  Remarks on the method of paired comparisons: III. A test of significance for paired comparisons when equal standard deviations and equal correlations are assumed , 1951, Psychometrika.

[4]  S. Siegel,et al.  Nonparametric Statistics for the Behavioral Sciences , 2022, The SAGE Encyclopedia of Research Design.

[5]  Harold J. Kushner,et al.  A New Method of Locating the Maximum Point of an Arbitrary Multipeak Curve in the Presence of Noise , 1964 .

[6]  Carlos S. Kubrusly,et al.  Stochastic approximation algorithms and applications , 1973, CDC 1973.

[7]  A. Tversky,et al.  Prospect Theory. An Analysis of Decision Making Under Risk , 1977 .

[8]  D. McFadden Econometric Models for Probabilistic Choice Among Products , 1980 .

[9]  R. Forthofer,et al.  Rank Correlation Methods , 1981 .

[10]  Bruce E. Stuckman,et al.  A global search method for optimizing nonlinear systems , 1988, IEEE Trans. Syst. Man Cybern..

[11]  J. E. Glynn,et al.  Numerical Recipes: The Art of Scientific Computing , 1989 .

[12]  J. Mockus,et al.  The Bayesian approach to global optimization , 1989 .

[13]  Bruno Betrò,et al.  Bayesian methods in global optimization , 1991, J. Glob. Optim..

[14]  J. Aplevich,et al.  Lecture Notes in Control and Information Sciences , 1979 .

[15]  J. Elder Global R/sup d/ optimization when probes are expensive: the GROPE algorithm , 1992, [Proceedings] 1992 IEEE International Conference on Systems, Man, and Cybernetics.

[16]  A. Tversky,et al.  Advances in prospect theory: Cumulative representation of uncertainty , 1992 .

[17]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[18]  D. Dennis,et al.  A statistical method for global optimization , 1992, [Proceedings] 1992 IEEE International Conference on Systems, Man, and Cybernetics.

[19]  Eric J. Johnson,et al.  The adaptive decision maker , 1993 .

[20]  C. D. Perttunen,et al.  Lipschitzian optimization without the Lipschitz constant , 1993 .

[21]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[22]  Jonas Mockus,et al.  Application of Bayesian approach to numerical methods of global and stochastic optimization , 1994, J. Glob. Optim..

[23]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[24]  D. Dennis,et al.  SDO : A Statistical Method for Global Optimization , 1997 .

[25]  Paul W. Goldberg,et al.  Regression with Input-dependent Noise: A Gaussian Process Treatment , 1997, NIPS.

[26]  Marco Locatelli,et al.  Bayesian Algorithms for One-Dimensional Global Optimization , 1997, J. Glob. Optim..

[27]  William J. Welch,et al.  Computer experiments and global optimization , 1997 .

[28]  Ronald E. Parr,et al.  Hierarchical control and learning for markov decision processes , 1998 .

[29]  Donald R. Jones,et al.  Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[30]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[31]  Simon Streltsov,et al.  A Non-myopic Utility Function for Statistical Global Optimization Algorithms , 1999, J. Glob. Optim..

[32]  J. Hiriart-Urruty,et al.  Comparison of public-domain software for black box global optimization , 2000 .

[33]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[34]  Thomas J. Santner,et al.  Sequential design of computer experiments to minimize integrated response functions , 2000 .

[35]  David Andre,et al.  Programmable Reinforcement Learning Agents , 2000, NIPS.

[36]  Roger Woodard,et al.  Interpolation of Spatial Data: Some Theory for Kriging , 1999, Technometrics.

[37]  Michael I. Jordan,et al.  PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.

[38]  Charles Audet,et al.  A surrogate-model-based method for constrained optimization , 2000 .

[39]  Donald R. Jones,et al.  A Taxonomy of Global Optimization Methods Based on Response Surfaces , 2001, J. Glob. Optim..

[40]  A. Rivlin,et al.  Economic Choices , 2001 .

[41]  Roderick Murray-Smith,et al.  Gaussian process priors with ARMA noise models , 2001 .

[42]  Marc G. Genton,et al.  Classes of Kernels for Machine Learning: A Statistics Perspective , 2002, J. Mach. Learn. Res..

[43]  Daniel Sbarbaro,et al.  Nonlinear adaptive control using non-parametric Gaussian Process prior models , 2002 .

[44]  Michael James Sasena,et al.  Flexibility and efficiency enhancements for constrained global design optimization with kriging approximations. , 2002 .

[45]  A. ilinskas,et al.  Global optimization based on a statistical model and simplicial partitioning , 2002 .

[46]  A. Zilinskas,et al.  Global optimization based on a statistical model and simplicial partitioning , 2002 .

[47]  A. Hanks Canada , 2002 .

[48]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[49]  J. Elder Global Rd Optimization when Probes are Expensive : the GROPE Algorithm , 2003 .

[50]  Constance de Koning,et al.  Editors , 2003, Annals of Emergency Medicine.

[51]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[52]  A. Karimi,et al.  Master‟s thesis , 2011 .

[53]  Wei Chu,et al.  Preference learning with Gaussian processes , 2005, ICML.

[54]  Bhaskara Marthi,et al.  Concurrent Hierarchical Reinforcement Learning , 2005, IJCAI.

[55]  Thomas Bartz-Beielstein,et al.  Sequential parameter optimization , 2005, 2005 IEEE Congress on Evolutionary Computation.

[56]  M. Ghavamzadeh,et al.  Hierarchical reinforcement learning in continuous state and multi-agent environments , 2005 .

[57]  Arnaud Doucet,et al.  Particle methods for optimal filter derivative: application to parameter estimation , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[58]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[59]  Wei Chu,et al.  Extensions of Gaussian Processes for Ranking : Semi-supervised and Active Learning , 2005 .

[60]  N. Zheng,et al.  Global Optimization of Stochastic Black-Box Systems via Sequential Kriging Meta-Models , 2006, J. Glob. Optim..

[61]  Hoang Tuy,et al.  Optimization under Composite Monotonic Constraints and Constrained Optimization over the Efficient Set , 2006 .

[62]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[63]  Tom Minka,et al.  TrueSkillTM: A Bayesian Skill Rating System , 2006, NIPS.

[64]  C. Holmes,et al.  Bayesian auxiliary variable models for binary and multinomial regression , 2006 .

[65]  Nando de Freitas,et al.  Analysis of Particle Methods for Simultaneous Robot Localization and Mapping and a New Algorithm: Marginal-SLAM , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[66]  Nando de Freitas,et al.  Active Policy Learning for Robot Planning and Exploration under Uncertainty , 2007, Robotics: Science and Systems.

[67]  Thomas Hofmann,et al.  TrueSkill™: A Bayesian Skill Rating System , 2007 .

[68]  Julien Bect,et al.  On the convergence of the expected improvement algorithm , 2007 .

[69]  Trevor Darrell,et al.  Active Learning with Gaussian Processes for Object Categorization , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[70]  Nando de Freitas,et al.  Preference galleries for material design , 2007, SIGGRAPH '07.

[71]  Nando de Freitas,et al.  Active Preference Learning with Discrete Choice Data , 2007, NIPS.

[72]  E. Vázquez,et al.  Convergence properties of the expected improvement algorithm , 2007, 0712.3744.

[73]  P. Diggle,et al.  Model‐based geostatistics , 2007 .

[74]  Tao Wang,et al.  Automatic Gait Optimization with Gaussian Process Regression , 2007, IJCAI.

[75]  Phillip Boyle,et al.  Gaussian Processes for Regression and Optimisation , 2007 .

[76]  Marcus R. Frean,et al.  Using Gaussian Processes to Optimize Expensive Functions , 2008, Australasian Conference on Artificial Intelligence.

[77]  D. Lizotte Practical bayesian optimization , 2008 .

[78]  KrauseAndreas,et al.  Near-Optimal Sensor Placements in Gaussian Processes: Theory, Efficient Algorithms and Empirical Studies , 2008 .

[79]  D. Ginsbourger,et al.  A Multi-points Criterion for Deterministic Parallel Global Optimization based on Gaussian Processes , 2008 .

[80]  A. Zhigljavsky Stochastic Global Optimization , 2008, International Encyclopedia of Statistical Science.

[81]  Andreas Krause,et al.  Near-Optimal Sensor Placements in Gaussian Processes: Theory, Efficient Algorithms and Empirical Studies , 2008, J. Mach. Learn. Res..

[82]  Vlad M. Cora Model-Based Active Learning in Hierarchical Policies , 2008 .

[83]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[84]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[85]  Nando de Freitas,et al.  A Bayesian exploration-exploitation approach for optimal online sensing and planning with a visually guided mobile robot , 2009, Auton. Robots.

[86]  Daniel Busby,et al.  Hierarchical adaptive experimental design for Gaussian process emulators , 2009, Reliab. Eng. Syst. Saf..

[87]  Kevin P. Murphy,et al.  An experimental investigation of model-based parameter optimisation: SPO and beyond , 2009, GECCO.

[88]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[89]  Nando de Freitas,et al.  A Bayesian interactive optimization approach to procedural animation design , 2010, SCA '10.

[90]  Steven Reece,et al.  Sequential Bayesian Prediction in the Presence of Changepoints and Faults , 2010, Comput. J..

[91]  Alan Fern,et al.  Batch Bayesian Optimization via Simulation Matching , 2010, NIPS.

[92]  Roman Garnett,et al.  Bayesian optimization for sensor set selection , 2010, IPSN '10.

[93]  Michael A. Osborne Bayesian Gaussian processes for sequential prediction, optimisation and quadrature , 2010 .

[94]  Roman Garnett,et al.  Active Data Selection for Sensor Networks with Faults and Changepoints , 2010, 2010 24th IEEE International Conference on Advanced Information Networking and Applications.

[95]  Sonja Kuhnt,et al.  Design and analysis of computer experiments , 2010 .

[96]  Nando de Freitas,et al.  Hedging Strategies for Bayesian Optimization , 2010 .

[97]  Nando de Freitas,et al.  Portfolio Allocation for Bayesian Optimization , 2010, UAI.