ProBO: a Framework for Using Probabilistic Programming in Bayesian Optimization

Optimizing an expensive-to-query function is a common task in science and engineering, where it is beneficial to keep the number of queries to a minimum. A popular strategy is Bayesian optimization (BO), which leverages probabilistic models for this task. Most BO today uses Gaussian processes (GPs), or a few other surrogate models. However, there is a broad set of Bayesian modeling techniques that we may want to use to capture complex systems and reduce the number of queries. Probabilistic programs (PPs) are modern tools that allow for flexible model composition, incorporation of prior information, and automatic inference. In this paper, we develop ProBO, a framework for BO using only standard operations common to most PPs. This allows a user to drop in an arbitrary PP implementation and use it directly in BO. To do this, we describe black box versions of popular acquisition functions that can be used in our framework automatically, without model-specific derivation, and show how to optimize these functions. We also introduce a model, which we term the Bayesian Product of Experts, that integrates into ProBO and can be used to combine information from multiple models implemented with different PPs. We show empirical results using multiple PP implementations, and compare against standard BO methods.

[1]  Andrew Gelman,et al.  The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo , 2011, J. Mach. Learn. Res..

[2]  Dustin Tran,et al.  Deep Probabilistic Programming , 2017, ICLR.

[3]  Ian Gibson,et al.  Accelerating Experimental Design by Incorporating Experimenter Hunches , 2018, 2018 IEEE International Conference on Data Mining (ICDM).

[4]  Chong Wang,et al.  Asymptotically Exact, Embarrassingly Parallel MCMC , 2013, UAI.

[5]  Matthias Poloczek,et al.  Bayesian Optimization with Gradients , 2017, NIPS.

[6]  O. François,et al.  Approximate Bayesian Computation (ABC) in practice. , 2010, Trends in ecology & evolution.

[7]  Xiangyu Wang,et al.  Parallelizing MCMC with Random Partition Trees , 2015, NIPS.

[8]  Dustin Tran,et al.  Edward: A library for probabilistic modeling, inference, and criticism , 2016, ArXiv.

[9]  Kirthevasan Kandasamy,et al.  Myopic Bayesian Design of Experiments via Posterior Sampling and Probabilistic Programming , 2018, ArXiv.

[10]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[11]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[12]  Kirthevasan Kandasamy,et al.  Neural Architecture Search with Bayesian Optimisation and Optimal Transport , 2018, NeurIPS.

[13]  David Hinkley,et al.  Bootstrap Methods: Another Look at the Jackknife , 2008 .

[14]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[15]  Frank D. Wood,et al.  A New Approach to Probabilistic Programming Inference , 2014, AISTATS.

[16]  Aki Vehtari,et al.  Bayesian Optimization of Unimodal Functions , 2017 .

[17]  Frank Hutter,et al.  Maximizing acquisition functions for Bayesian optimization , 2018, NeurIPS.

[18]  Frank D. Wood,et al.  Inference Compilation and Universal Probabilistic Programming , 2016, AISTATS.

[19]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[20]  Jiqiang Guo,et al.  Stan: A Probabilistic Programming Language. , 2017, Journal of statistical software.

[21]  Raul Astudillo Multi-Attribute Bayesian Optimization under Utility Uncertainty , 2017 .

[22]  Harold J. Kushner,et al.  A New Method of Locating the Maximum Point of an Arbitrary Multipeak Curve in the Presence of Noise , 1964 .

[23]  Noah D. Goodman,et al.  Pyro: Deep Universal Probabilistic Programming , 2018, J. Mach. Learn. Res..

[24]  Jonas Mockus,et al.  On Bayesian Methods for Seeking the Extremum , 1974, Optimization Techniques.

[25]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[26]  Alexander I. J. Forrester,et al.  Multi-fidelity optimization via surrogate modelling , 2007, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[27]  Matthew W. Hoffman,et al.  Predictive Entropy Search for Bayesian Optimization with Unknown Constraints , 2015, ICML.

[28]  Yura N. Perov,et al.  Venture: a higher-order probabilistic programming platform with programmable inference , 2014, ArXiv.

[29]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[30]  J. Orbach Principles of Neurodynamics. Perceptrons and the Theory of Brain Mechanisms. , 1962 .

[31]  Noah D. Goodman,et al.  Deep Amortized Inference for Probabilistic Programs , 2016, ArXiv.

[32]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[33]  Eric P. Xing,et al.  Post-Inference Prior Swapping , 2016, ICML.

[34]  John Salvatier,et al.  Probabilistic programming in Python using PyMC3 , 2016, PeerJ Comput. Sci..

[35]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[36]  Jasper Snoek,et al.  Multi-Task Bayesian Optimization , 2013, NIPS.