Mixed-Variable Bayesian Optimization

The optimization of expensive to evaluate, black-box, mixed-variable functions, i.e. functions that have continuous and discrete inputs, is a difficult and yet pervasive problem in science and engineering. In Bayesian optimization (BO), special cases of this problem that consider fully continuous or fully discrete domains have been widely studied. However, few methods exist for mixed-variable domains and none of them can handle discrete constraints that arise in many real-world applications. In this paper, we introduce MiVaBo, a novel BO algorithm for the efficient optimization of mixed-variable functions combining a linear surrogate model based on expressive feature representations with Thompson sampling. We propose an effective method to optimize its acquisition function, a challenging problem for mixed-variable domains, making MiVaBo the first BO method that can handle complex constraints over the discrete variables. Moreover, we provide the first convergence analysis of a mixed-variable BO algorithm. Finally, we show that MiVaBo is significantly more sample efficient than state-of-the-art mixed-variable BO algorithms on several hyperparameter tuning tasks, including the tuning of deep generative models.

[1]  Martin Holena,et al.  Surrogate Model for Continuous and Discrete Genetic Optimization Based on RBF Networks , 2010, IDEAL.

[2]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[3]  Michael A. Osborne,et al.  Bayesian Optimisation over Multiple Continuous and Categorical Inputs , 2019, ICML.

[4]  Zi Wang,et al.  Max-value Entropy Search for Efficient Bayesian Optimization , 2017, ICML.

[5]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[6]  Matt J. Kusner,et al.  Bayesian Optimization with Inequality Constraints , 2014, ICML.

[7]  C. Archambeau,et al.  Multiple Adaptive Bayesian Linear Regression for Scalable Bayesian Optimization with Warm Start , 2017, 1712.02902.

[8]  Ruslan Salakhutdinov,et al.  Importance Weighted Autoencoders , 2015, ICLR.

[9]  Jos'e Miguel Hern'andez-Lobato,et al.  Bayesian Variational Autoencoders for Unsupervised Out-of-Distribution Detection , 2019, ArXiv.

[10]  Thomas Brox,et al.  Synthesizing the preferred inputs for neurons in neural networks via deep generator networks , 2016, NIPS.

[11]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[12]  Peter L. Hammer,et al.  Discrete Applied Mathematics , 1993 .

[13]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[14]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[15]  Max Welling,et al.  Markov Chain Monte Carlo and Variational Inference: Bridging the Gap , 2014, ICML.

[16]  James Bergstra,et al.  Implementations of Algorithms for Hyper-Parameter Optimization , 2011 .

[17]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[18]  Andreas Krause,et al.  Efficient High Dimensional Bayesian Optimization with Additivity and Quadrature Fourier Features , 2018, NeurIPS.

[19]  Tommi S. Jaakkola,et al.  Introduction to dual composition for inference , 2011 .

[20]  Matthias W. Seeger,et al.  Large Scale Bayesian Inference and Experimental Design for Sparse Linear Models , 2011, SIAM J. Imaging Sci..

[21]  Neil D. Lawrence,et al.  Structured Variationally Auto-encoded Optimization , 2018, ICML.

[22]  Trevor Hastie,et al.  Statistical Models in S , 1991 .

[23]  Andreas Krause,et al.  Near-Optimal Sensor Placements in Gaussian Processes: Theory, Efficient Algorithms and Empirical Studies , 2008, J. Mach. Learn. Res..

[24]  Regina Barzilay,et al.  Junction Tree Variational Autoencoder for Molecular Graph Generation , 2018, ICML.

[25]  Yee Whye Teh,et al.  Tighter Variational Bounds are Not Necessarily Better , 2018, ICML.

[26]  Max Welling,et al.  Combinatorial Bayesian Optimization using Graph Representations , 2019, ArXiv.

[27]  Alkis Gotovos,et al.  Safe Exploration for Optimization with Gaussian Processes , 2015, ICML.

[28]  Steven Skiena,et al.  Syntax-Directed Variational Autoencoder for Structured Data , 2018, ICLR.

[29]  Alexander M. Rush,et al.  A Tutorial on Dual Decomposition and Lagrangian Relaxation for Inference in Natural Language Processing , 2012, J. Artif. Intell. Res..

[30]  Eduardo C. Garrido-Merchán,et al.  Dealing with Categorical and Integer-valued Variables in Bayesian Optimization with Gaussian Processes , 2017, Neurocomputing.

[31]  Philipp Hennig,et al.  Entropy Search for Information-Efficient Global Optimization , 2011, J. Mach. Learn. Res..

[32]  Donald R. Jones,et al.  Direct Global Optimization Algorithm , 2009, Encyclopedia of Optimization.

[33]  Matthias Poloczek,et al.  Bayesian Optimization of Combinatorial Structures , 2018, ICML.

[34]  Ameet Talwalkar,et al.  Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization , 2016, J. Mach. Learn. Res..

[35]  Nikos Komodakis,et al.  MRF Energy Minimization and Beyond via Dual Decomposition , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Volkan Cevher,et al.  High-Dimensional Bayesian Optimization via Additive Models with Overlapping Groups , 2018, AISTATS.

[37]  Prabhat,et al.  Scalable Bayesian Optimization Using Deep Neural Networks , 2015, ICML.

[38]  Benjamin Recht,et al.  Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning , 2008, NIPS.

[39]  Thomas Brox,et al.  Learning to generate chairs with convolutional neural networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[41]  Yang Yuan,et al.  Hyperparameter Optimization: A Spectral Approach , 2017, ICLR.

[42]  Alessandro Lazaric,et al.  Linear Thompson Sampling Revisited , 2016, International Conference on Artificial Intelligence and Statistics.

[43]  Kian Hsiang Low,et al.  Decentralized High-Dimensional Bayesian Optimization with Factor Graphs , 2017, AAAI.

[44]  R. Tibshirani,et al.  Generalized Additive Models , 1986 .

[45]  Luís Torgo,et al.  OpenML: networked science in machine learning , 2014, SKDD.

[46]  Hannes Nickisch glm-ie: Generalised Linear Models Inference & Estimation Toolbox , 2012, J. Mach. Learn. Res..

[47]  José Miguel Hernández-Lobato,et al.  A COLD Approach to Generating Optimal Samples , 2019, ArXiv.

[48]  Tom Minka,et al.  Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[49]  Matthew W. Hoffman,et al.  Predictive Entropy Search for Bayesian Optimization with Unknown Constraints , 2015, ICML.

[50]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[51]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[52]  Kevin Leyton-Brown,et al.  Efficient Benchmarking of Hyperparameter Optimizers via Surrogates , 2015, AAAI.

[53]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[54]  Matthew W. Hoffman,et al.  Predictive Entropy Search for Efficient Global Optimization of Black-box Functions , 2014, NIPS.

[55]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[56]  Shipra Agrawal,et al.  Thompson Sampling for Contextual Bandits with Linear Payoffs , 2012, ICML.

[57]  Seungjin Choi,et al.  Bayesian Optimization over Sets , 2019, ArXiv.

[58]  Daniel Hernández-Lobato,et al.  Generalized spike-and-slab priors for Bayesian group feature selection using expectation propagation , 2013, J. Mach. Learn. Res..

[59]  Jos'e Miguel Hern'andez-Lobato,et al.  Sample-Efficient Optimization in the Latent Space of Deep Generative Models via Weighted Retraining , 2020, NeurIPS.

[60]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[61]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[62]  Matthias W. Seeger,et al.  Bayesian Optimization with Tree-structured Dependencies , 2017, ICML.

[63]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[64]  Warren B. Powell,et al.  The Knowledge-Gradient Algorithm for Sequencing Experiments in Drug Discovery , 2011, INFORMS J. Comput..

[65]  Matthias W. Seeger,et al.  Compressed sensing and Bayesian experimental design , 2008, ICML '08.

[66]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[67]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[68]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[69]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[70]  Kevin Leyton-Brown,et al.  Automated Configuration of Mixed Integer Programming Solvers , 2010, CPAIOR.

[71]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[72]  Endre Boros,et al.  Pseudo-Boolean optimization , 2002, Discret. Appl. Math..

[73]  Jos'e Miguel Hern'andez-Lobato,et al.  Constrained Bayesian Optimization for Automatic Chemical Design , 2017 .

[74]  Jonas Mockus,et al.  On Bayesian Methods for Seeking the Extremum , 1974, Optimization Techniques.

[75]  Ryan O'Donnell,et al.  Analysis of Boolean Functions , 2014, ArXiv.

[76]  Daniel Hernández-Lobato,et al.  Expectation propagation in linear regression models with spike-and-slab priors , 2015, Machine Learning.

[77]  Florian Steinke,et al.  Bayesian Inference and Optimal Design in the Sparse Linear Model , 2007, AISTATS.

[78]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[79]  Jakub M. Tomczak,et al.  Combinatorial Bayesian Optimization using the Graph Cartesian Product , 2019, NeurIPS.

[80]  Aaron Klein,et al.  BOHB: Robust and Efficient Hyperparameter Optimization at Scale , 2018, ICML.