Bayesian Optimization with Binary Auxiliary Information

This paper presents novel mixed-type Bayesian optimization (BO) algorithms to accelerate the optimization of a target objective function by exploiting correlated auxiliary information of binary type that can be more cheaply obtained, such as in policy search for reinforcement learning and hyperparameter tuning of machine learning models with early stopping. To achieve this, we first propose a mixed-type multi-output Gaussian process (MOGP) to jointly model the continuous target function and binary auxiliary functions. Then, we propose information-based acquisition functions such as mixed-type entropy search (MT-ES) and mixed-type predictive ES (MT-PES) for mixed-type BO based on the MOGP predictive belief of the target and auxiliary functions. The exact acquisition functions of MT-ES and MT-PES cannot be computed in closed form and need to be approximated. We derive an efficient approximation of MT-PES via a novel mixed-type random features approximation of the MOGP model whose cross-correlation structure between the target and auxiliary functions can be exploited for improving the belief of the global target maximizer using observations from evaluating these functions. We propose new practical constraints to relate the global target maximizer to the binary auxiliary functions. We empirically evaluate the performance of MT-ES and MT-PES with synthetic and real-world experiments.

[1]  Eric Walter,et al.  An informational approach to the global optimization of expensive-to-evaluate functions , 2006, J. Glob. Optim..

[2]  Yee Whye Teh,et al.  Semiparametric latent factor models , 2005, AISTATS.

[3]  Neil D. Lawrence,et al.  Preferential Bayesian Optimization , 2017, ICML.

[4]  H. Wackernagle,et al.  Multivariate geostatistics: an introduction with applications , 1998 .

[5]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[6]  Carl E. Rasmussen,et al.  Sparse Spectrum Gaussian Process Regression , 2010, J. Mach. Learn. Res..

[7]  Mike Rees,et al.  5. Statistics for Spatial Data , 1993 .

[8]  Matthew W. Hoffman,et al.  Predictive Entropy Search for Efficient Global Optimization of Black-box Functions , 2014, NIPS.

[9]  R. A. Miller,et al.  Sequential kriging optimization using multiple-fidelity evaluations , 2006 .

[10]  Benjamin Van Roy,et al.  A Tutorial on Thompson Sampling , 2017, Found. Trends Mach. Learn..

[11]  Kirthevasan Kandasamy,et al.  Multi-Fidelity Black-Box Optimization with Hierarchical Partitions , 2018, ICML.

[12]  Ameet Talwalkar,et al.  Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization , 2016, J. Mach. Learn. Res..

[13]  Grigorios Skolidis,et al.  Transfer learning with Gaussian processes , 2012 .

[14]  Mohan S. Kankanhalli,et al.  Near-Optimal Active Learning of Multi-Output Gaussian Processes , 2015, AAAI.

[15]  Kian Hsiang Low,et al.  Bayesian Optimization Meets Bayesian Optimal Stopping , 2019, ICML.

[16]  Philipp Hennig,et al.  Entropy Search for Information-Efficient Global Optimization , 2011, J. Mach. Learn. Res..

[17]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[18]  Howie Choset,et al.  Expensive Function Optimization with Stochastic Binary Outcomes , 2013, ICML.

[19]  R. Reese Geostatistics for Environmental Scientists , 2001 .

[20]  Marko Wagner,et al.  Geostatistics For Environmental Scientists , 2016 .

[21]  Neil D. Lawrence,et al.  Computationally Efficient Convolved Multiple Output Gaussian Processes , 2011, J. Mach. Learn. Res..

[22]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[23]  Luca Vogt Statistics For Spatial Data , 2016 .

[24]  Bryan Kian Hsiang Low,et al.  Information-Based Multi-Fidelity Bayesian Optimization , 2017 .

[25]  Edwin V. Bonilla,et al.  Multi-task Gaussian Process Prediction , 2007, NIPS.

[26]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[27]  Tom Minka,et al.  A family of algorithms for approximate Bayesian inference , 2001 .

[28]  Matthew W. Hoffman,et al.  A General Framework for Constrained Bayesian Optimization using Information-based Search , 2015, J. Mach. Learn. Res..

[29]  Aaron Klein,et al.  BOHB: Robust and Efficient Hyperparameter Optimization at Scale , 2018, ICML.

[30]  Hans Wackernagel,et al.  Multivariate Geostatistics: An Introduction with Applications , 1996 .

[31]  Donald A. Berry,et al.  Simulation-based sequential Bayesian design , 2007 .

[32]  Kirthevasan Kandasamy,et al.  Multi-fidelity Bayesian Optimisation with Continuous Approximations , 2017, ICML.

[33]  Kirthevasan Kandasamy,et al.  Gaussian Process Bandit Optimisation with Multi-fidelity Evaluations , 2016, NIPS.

[34]  Tony Pourmohamad,et al.  Multivariate Stochastic Process Models for Correlated Responses of Mixed Type , 2016 .

[35]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[36]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.