Constrained Bayesian Optimization with Noisy Experiments

Randomized experiments are the gold standard for evaluating the effects of changes to real-world systems. Data in these tests may be difficult to collect and outcomes may have high variance, resulting in potentially large measurement error. Bayesian optimization is a promising technique for efficiently optimizing multiple continuous parameters, but existing approaches degrade in performance when the noise level is high, limiting its applicability to many randomized experiments. We derive an expression for expected improvement under greedy batch optimization with noisy observations and noisy constraints, and develop a quasi-Monte Carlo approximation that allows it to be efficiently optimized. Simulations with synthetic functions show that optimization performance on noisy, constrained problems outperforms existing methods. We further demonstrate the effectiveness of the method with two real-world experiments conducted at Facebook: optimizing a ranking system, and optimizing server compiler flags.

[1]  Donglin Zeng,et al.  Estimating Individualized Treatment Rules Using Outcome Weighted Learning , 2012, Journal of the American Statistical Association.

[2]  N. Zheng,et al.  Global Optimization of Stochastic Black-Box Systems via Sequential Kriging Meta-Models , 2006, J. Glob. Optim..

[3]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[4]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[5]  David Ginsbourger,et al.  Efficient batch-sequential Bayesian optimization with moments of truncated Gaussian vectors , 2016, 1609.02700.

[6]  Adam D. Bull,et al.  Convergence Rates of Efficient Global Optimization Algorithms , 2011, J. Mach. Learn. Res..

[7]  Donald R. Jones,et al.  Global versus local search in constrained optimization of computer models , 1998 .

[8]  Sébastien Le Digabel,et al.  Modeling an Augmented Lagrangian for Blackbox Constrained Optimization , 2014, Technometrics.

[9]  Jasper Snoek,et al.  Bayesian Optimization with Unknown Constraints , 2014, UAI.

[10]  John Langford,et al.  Doubly Robust Policy Evaluation and Optimization , 2014, ArXiv.

[11]  Christian Posse,et al.  Multiple objective optimization in recommender systems , 2012, RecSys.

[12]  Mike Ludkovski,et al.  Replication or Exploration? Sequential Design for Stochastic Simulation Experiments , 2017, Technometrics.

[13]  Evgeniy Gabrilovich,et al.  The anatomy of an ad: structured indexing and retrieval for sponsored search , 2010, WWW '10.

[14]  Kirthevasan Kandasamy,et al.  Parallelised Bayesian Optimisation via Thompson Sampling , 2018, AISTATS.

[15]  Warren B. Powell,et al.  The Correlated Knowledge Gradient for Simulation Optimization of Continuous Parameters using Gaussian Process Regression , 2011, SIAM J. Optim..

[16]  David Ginsbourger,et al.  Noisy Expected Improvement and on-line computation time allocation for the optimization of simulators with tunable fidelity , 2010 .

[17]  Victor Picheny,et al.  Bayesian optimization under mixed constraints with a slack-variable augmented Lagrangian , 2016, NIPS.

[18]  Matthew W. Hoffman,et al.  Predictive Entropy Search for Bayesian Optimization with Unknown Constraints , 2015, ICML.

[19]  Kai-Tai Fang,et al.  The effective dimension and quasi-Monte Carlo integration , 2003, J. Complex..

[20]  Paul Covington,et al.  Deep Neural Networks for YouTube Recommendations , 2016, RecSys.

[21]  Andreas Krause,et al.  Virtual vs. real: Trading off simulations and physical experiments in reinforcement learning with Bayesian optimization , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[22]  Eytan Bakshy,et al.  Design and Analysis of Benchmarking Experiments for Distributed Internet Services , 2015, WWW.

[23]  D. Ginsbourger,et al.  A benchmark of kriging-based infill criteria for noisy optimization , 2013, Structural and Multidisciplinary Optimization.

[24]  Alan Fern,et al.  Using trajectory data to improve bayesian optimization for reinforcement learning , 2014, J. Mach. Learn. Res..

[25]  Ron Kohavi,et al.  Seven rules of thumb for web site experimenters , 2014, KDD.

[26]  Eric Walter,et al.  An informational approach to the global optimization of expensive-to-evaluate functions , 2006, J. Glob. Optim..

[27]  D. Lizotte Practical bayesian optimization , 2008 .

[28]  Herbert K. H. Lee,et al.  Bayesian Guided Pattern Search for Robust Local Optimization , 2009, Technometrics.

[29]  Colin Rose Computational Statistics , 2011, International Encyclopedia of Statistical Science.

[30]  Robert B. Gramacy,et al.  Optimization Under Unknown Constraints , 2010, 1004.4027.

[31]  Matthew W. Hoffman,et al.  Predictive Entropy Search for Efficient Global Optimization of Black-box Functions , 2014, NIPS.

[32]  Stefan Wager,et al.  Efficient Policy Learning , 2017, ArXiv.

[33]  D. Ginsbourger,et al.  Dealing with asynchronicity in parallel Gaussian Process based global optimization , 2010 .

[34]  Art B. Owen,et al.  Scrambling Sobol' and Niederreiter-Xing Points , 1998, J. Complex..

[35]  Victor Picheny,et al.  Quantile-Based Optimization of Noisy Computer Experiments With Tunable Precision , 2013, Technometrics.

[36]  Donald R. Jones,et al.  Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[37]  Peter I. Frazier,et al.  The Parallel Knowledge Gradient Method for Batch Bayesian Optimization , 2016, NIPS.

[38]  R. Gramacy,et al.  Categorical Inputs, Sensitivity Analysis, Optimization and Importance Tempering with tgp Version 2, an R Package for Treed Gaussian Process Models , 2010 .

[39]  Alex Deng,et al.  Data-Driven Metric Development for Online Controlled Experiments: Seven Lessons Learned , 2016, KDD.

[40]  Philipp Hennig,et al.  Entropy Search for Information-Efficient Global Optimization , 2011, J. Mach. Learn. Res..

[41]  Guilherme Ottoni,et al.  The hiphop virtual machine , 2014, OOPSLA.

[42]  R. Caflisch Monte Carlo and quasi-Monte Carlo methods , 1998, Acta Numerica.

[43]  Frances Y. Kuo,et al.  High-dimensional integration: The quasi-Monte Carlo way*† , 2013, Acta Numerica.

[44]  Andrew Gelman,et al.  The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo , 2011, J. Mach. Learn. Res..

[45]  David Ginsbourger,et al.  Fast Computation of the Multi-Points Expected Improvement with Applications in Batch Selection , 2013, LION.

[46]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[47]  Jan Hendrik Metzen,et al.  Minimum Regret Search for Single- and Multi-Task Optimization , 2016, ICML.

[48]  Victor Picheny,et al.  Comparison of Kriging-based algorithms for simulation optimization with heterogeneous noise , 2017, Eur. J. Oper. Res..

[49]  Eric Walter,et al.  Global optimization based on noisy evaluations: An empirical study of two statistical approaches , 2008 .

[50]  Zoubin Ghahramani,et al.  Parallel Predictive Entropy Search for Batch Global Optimization of Expensive Objective Functions , 2015, NIPS.

[51]  김성래,et al.  Multiple Objective Optimization에 의한 신호처리 알고리즘 , 2010 .

[52]  Alán Aspuru-Guzik,et al.  Parallel and Distributed Thompson Sampling for Large-scale Accelerated Exploration of Chemical Space , 2017, ICML.

[53]  Jasper Snoek,et al.  Input Warping for Bayesian Optimization of Non-Stationary Functions , 2014, ICML.

[54]  Matt J. Kusner,et al.  Bayesian Optimization with Inequality Constraints , 2014, ICML.