A Study of Bayesian Neural Network Surrogates for Bayesian Optimization

Bayesian optimization is a highly efficient approach to optimizing objective functions which are expensive to query. These objectives are typically represented by Gaussian process (GP) surrogate models which are easy to optimize and support exact inference. While standard GP surrogates have been well-established in Bayesian optimization, Bayesian neural networks (BNNs) have recently become practical function approximators, with many benefits over standard GPs such as the ability to naturally handle non-stationarity and learn representations for high-dimensional data. In this paper, we study BNNs as alternatives to standard GP surrogates for optimization. We consider a variety of approximate inference procedures for finite-width BNNs, including high-quality Hamiltonian Monte Carlo, low-cost stochastic MCMC, and heuristics such as deep ensembles. We also consider infinite-width BNNs and partially stochastic models such as deep kernel learning. We evaluate this collection of surrogate models on diverse problems with varying dimensionality, number of objectives, non-stationarity, and discrete and continuous inputs. We find: (i) the ranking of methods is highly problem dependent, suggesting the need for tailored inductive biases; (ii) HMC is the most successful approximate inference procedure for fully stochastic BNNs; (iii) full stochasticity may be unnecessary as deep kernel learning is relatively competitive; (iv) infinite-width BNNs are particularly promising, especially in high dimensions.

[1]  Agustinus Kristiadi,et al.  Promises and Pitfalls of the Linearized Laplace in Bayesian Optimization , 2023, ArXiv.

[2]  Michael W. Dusenberry,et al.  Plex: Towards Reliability using Pretrained Large Model Extensions , 2022, ArXiv.

[3]  Wesley J. Maddox,et al.  Accelerating Bayesian Optimization for Biological Sequence Design with Denoising Autoencoders , 2022, ICML.

[4]  F. Hutter,et al.  SMAC3: A Versatile Bayesian Optimization Package for Hyperparameter Optimization , 2021, J. Mach. Learn. Res..

[5]  M. E. Khan,et al.  The Bayesian Learning Rule , 2021, J. Mach. Learn. Res..

[6]  Erik A. Daxberger,et al.  Laplace Redux - Effortless Bayesian Deep Learning , 2021, NeurIPS.

[7]  Andrew Gordon Wilson,et al.  Bayesian Optimization with High-Dimensional Outputs , 2021, NeurIPS.

[8]  Andrew Gordon Wilson,et al.  Does Knowledge Distillation Really Work? , 2021, NeurIPS.

[9]  Andrew Gordon Wilson,et al.  What Are Bayesian Neural Network Posteriors Really Like? , 2021, ICML.

[10]  Peter Y. Lu,et al.  Deep Learning for Bayesian Optimization of Scientific Problems with High-Dimensional Structure , 2021, Trans. Mach. Learn. Res..

[11]  Robert W. Heath,et al.  Optimizing Coverage and Capacity in Cellular Networks using Machine Learning , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  M. Bauer,et al.  Improving predictions of Bayesian neural nets via local linearization , 2021, AISTATS.

[13]  Xianting Ding,et al.  Harnessing a Novel Machine-learning-assisted Evolutionary Algorithm to Co-optimize Three Characteristics of an Electrospun Oil Sorbent. , 2020, ACS applied materials & interfaces.

[14]  Alexander Ulanov,et al.  Interferobot: aligning an optical interferometer by a reinforcement learning agent , 2020, NeurIPS.

[15]  Pavel Izmailov,et al.  Bayesian Deep Learning and a Probabilistic Perspective of Generalization , 2020, NeurIPS.

[16]  Matthias Poloczek,et al.  Scalable Global Optimization via Local Bayesian Optimization , 2019, NeurIPS.

[17]  Sebastian Nowozin,et al.  Can You Trust Your Model's Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift , 2019, NeurIPS.

[18]  Jakub M. Tomczak,et al.  Combinatorial Bayesian Optimization using the Graph Cartesian Product , 2019, NeurIPS.

[19]  P. Frazier Bayesian Optimization , 2018, Hyperparameter Optimization in Machine Learning.

[20]  Peter I. Frazier,et al.  A Tutorial on Bayesian Optimization , 2018, ArXiv.

[21]  Jeffrey Pennington,et al.  Deep Neural Networks as Gaussian Processes , 2017, ICLR.

[22]  Roman Garnett,et al.  Discovering and Exploiting Additive Structure for Bayesian Optimization , 2017, AISTATS.

[23]  Zi Wang,et al.  Max-value Entropy Search for Efficient Bayesian Optimization , 2017, ICML.

[24]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[25]  Aaron Klein,et al.  Bayesian Optimization with Robust Bayesian Neural Networks , 2016, NIPS.

[26]  Andrew Gordon Wilson,et al.  Deep Kernel Learning , 2015, AISTATS.

[27]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[28]  Kirthevasan Kandasamy,et al.  High Dimensional Bayesian Optimisation and Bandits via Additive Models , 2015, ICML.

[29]  Prabhat,et al.  Scalable Bayesian Optimization Using Deep Neural Networks , 2015, ICML.

[30]  Tianqi Chen,et al.  Stochastic Gradient Hamiltonian Monte Carlo , 2014, ICML.

[31]  Jasper Snoek,et al.  Multi-Task Bayesian Optimization , 2013, NIPS.

[32]  David D. Cox,et al.  Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures , 2013, ICML.

[33]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[34]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[35]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[36]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[37]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[38]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[39]  Warren B. Powell,et al.  A Knowledge-Gradient Policy for Sequential Information Collection , 2008, SIAM J. Control. Optim..

[40]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[41]  Christopher K. I. Williams Computing with Infinite Networks , 1996, NIPS.

[42]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[43]  Tim G. J. Rudner,et al.  Continual Learning via Sequential Function-Space Variational Inference , 2023, ICML.

[44]  Daniel R. Jiang,et al.  BoTorch: A Framework for Efficient Monte-Carlo Bayesian Optimization , 2020, NeurIPS.

[45]  Radford M. Neal Bayesian learning for neural networks , 1995 .