Regret bounds for meta Bayesian optimization with an unknown Gaussian process prior

Bayesian optimization usually assumes that a Bayesian prior is given. However, the strong theoretical guarantees in Bayesian optimization are often regrettably compromised in practice because of unknown parameters in the prior. In this paper, we adopt a variant of empirical Bayes and show that, by estimating the Gaussian process prior from offline data sampled from the same prior and constructing unbiased estimators of the posterior, variants of both GP-UCB and probability of improvement achieve a near-zero regret bound, which decreases to a constant proportional to the observational noise as the number of offline data and the number of online evaluations increase. Empirically, we have verified our approach on challenging simulated robotic problems featuring task and motion planning.

[1]  Matthew W. Hoffman,et al.  Predictive Entropy Search for Efficient Global Optimization of Black-box Functions , 2014, NIPS.

[2]  Leslie Pack Kaelbling,et al.  Learning to guide task and motion planning using score-space representation , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[3]  Matthias Poloczek,et al.  Multi-Information Source Optimization , 2016, NIPS.

[4]  Bradley Efron,et al.  Bayes, Oracle Bayes and Empirical Bayes , 2019, Statistical Science.

[5]  Ameet Talwalkar,et al.  Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization , 2016, J. Mach. Learn. Res..

[6]  Kirthevasan Kandasamy,et al.  High Dimensional Bayesian Optimisation and Bandits via Additive Models , 2015, ICML.

[7]  Aaron Klein,et al.  Efficient and Robust Automated Machine Learning , 2015, NIPS.

[8]  A. W. van der Vaart,et al.  Adaptive Bayesian credible bands in regression with a Gaussian process prior , 2015, Sankhya A.

[9]  Frank Hutter,et al.  Initializing Bayesian Hyperparameter Optimization via Meta-Learning , 2015, AAAI.

[10]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[11]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[12]  Carl E. Rasmussen,et al.  Additive Gaussian Processes , 2011, NIPS.

[13]  Roman Garnett,et al.  Bayesian optimization for automated model selection , 2016, NIPS.

[14]  M. L. Eaton Multivariate statistics : a vector space approach , 1985 .

[15]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[16]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[17]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[18]  Mlnoru Slotani Tolerance regions for a multivariate normal population , 1964 .

[19]  D. Sculley,et al.  Google Vizier: A Service for Black-Box Optimization , 2017, KDD.

[20]  Jonathan Baxter,et al.  A Bayesian/information theoretic model of bias learning , 2019, COLT '96.

[21]  John C. Platt,et al.  Learning a Gaussian Process Prior for Automatically Generating Music Playlists , 2001, NIPS.

[22]  João Gama,et al.  Characterizing the Applicability of Classification Algorithms Using Meta-Level Learning , 1994, ECML.

[23]  R. Keener Theoretical Statistics: Topics for a Core Course , 2010 .

[24]  Misha Denil,et al.  Learning to Learn without Gradient Descent by Gradient Descent , 2016, ICML.

[25]  P. Massart,et al.  Adaptive estimation of a quadratic functional by model selection , 2000 .

[26]  Steven M. LaValle,et al.  Rapidly-Exploring Random Trees: Progress and Prospects , 2000 .

[27]  Rosalind W. Picard,et al.  Learning How to Learn is Learning With Point Sets , 2017 .

[28]  Andreas Krause,et al.  Truncated Variance Reduction: A Unified Approach to Bayesian Optimization and Level-Set Estimation , 2016, NIPS.

[29]  Andreas Krause,et al.  Contextual Gaussian Process Bandit Optimization , 2011, NIPS.

[30]  Kirthevasan Kandasamy,et al.  Neural Architecture Search with Bayesian Optimisation and Optimal Transport , 2018, NeurIPS.

[31]  Takeo Kanade,et al.  Automated Construction of Robotic Manipulation Programs , 2010 .

[32]  Svetha Venkatesh,et al.  Regret Bounds for Transfer Learning in Bayesian Optimisation , 2017, AISTATS.

[33]  Eytan Bakshy,et al.  Scalable Meta-Learning for Bayesian Optimization , 2018, ArXiv.

[34]  Nando de Freitas,et al.  Theoretical Analysis of Bayesian Optimisation with Unknown Gaussian Process Hyper-Parameters , 2014, ArXiv.

[35]  Jonas Mockus,et al.  On Bayesian Methods for Seeking the Extremum , 1974, Optimization Techniques.

[36]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2008, Found. Comput. Math..

[37]  Juergen Schmidhuber,et al.  On learning how to learn learning strategies , 1994 .

[38]  Judith Rousseau,et al.  Bayes and empirical Bayes : Do they merge? , 2012, 1204.1470.

[39]  Harold J. Kushner,et al.  A New Method of Locating the Maximum Point of an Arbitrary Multipeak Curve in the Presence of Noise , 1964 .

[40]  Le Song,et al.  Deep Semi-Random Features for Nonlinear Function Approximation , 2017, AAAI.

[41]  Gustavo Malkomes,et al.  Towards Automated Bayesian Optimization , 2017 .

[42]  Matthias Poloczek,et al.  Warm starting Bayesian optimization , 2016, 2016 Winter Simulation Conference (WSC).

[43]  Anja Vogler,et al.  An Introduction to Multivariate Statistical Analysis , 2004 .

[44]  Philipp Hennig,et al.  Entropy Search for Information-Efficient Global Optimization , 2011, J. Mach. Learn. Res..

[45]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[46]  H. Robbins An Empirical Bayes Approach to Statistics , 1956 .

[47]  Zi Wang,et al.  Max-value Entropy Search for Efficient Bayesian Optimization , 2017, ICML.

[48]  Michèle Sebag,et al.  Collaborative hyperparameter tuning , 2013, ICML.

[49]  Gideon S. Mann,et al.  Efficient Transfer Learning Method for Automatic Hyperparameter Tuning , 2014, AISTATS.

[50]  T. W. Anderson,et al.  An Introduction to Multivariate Statistical Analysis , 1959 .

[51]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[52]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[53]  Marc Toussaint,et al.  A No-Free-Lunch theorem for non-uniform distributions of target functions , 2004, J. Math. Model. Algorithms.

[54]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[55]  Jasper Snoek,et al.  Multi-Task Bayesian Optimization , 2013, NIPS.

[56]  Karim Lounici High-dimensional covariance matrix estimation with missing observations , 2012, 1201.2577.