How To Train Your Program

We present a Bayesian approach to machine learning with probabilistic programs. In our approach, training on available data is implemented as inference on a hierarchical model. The posterior distribution of model parameters is then used to stochastically condition a complementary model, such that inference on new data yields the same posterior distribution of latent parameters corresponding to the new data as inference on a hierachical model on the combination of both previously available and new data, at a lower computation cost. We frame the approach as a design pattern of probabilistic programming referred to herein as ‘stump and fungus’, and illustrate realization of the pattern on a didactic case study.

[1]  Tianqi Chen,et al.  Stochastic Gradient Hamiltonian Monte Carlo , 2014, ICML.

[2]  Emerson R. Murphy-Hill,et al.  Proceedings of the 2016 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software , 2016, SPLASH 2016.

[3]  Arnaud Doucet,et al.  Towards scaling up Markov chain Monte Carlo: an adaptive subsampling approach , 2014, ICML.

[4]  Michael I. Jordan,et al.  Learning Programs: A Hierarchical Bayesian Approach , 2010, ICML.

[5]  Max Welling,et al.  Austerity in MCMC Land: Cutting the Metropolis-Hastings Budget , 2013, ICML 2014.

[6]  Frank D. Wood,et al.  Inference Compilation and Universal Probabilistic Programming , 2016, AISTATS.

[7]  G. Casella An Introduction to Empirical Bayes Data Analysis , 1985 .

[8]  Noah D. Goodman,et al.  Inducing Probabilistic Programs by Bayesian Program Merging , 2011, ArXiv.

[9]  H. Robbins An Empirical Bayes Approach to Statistics , 1956 .

[10]  Tod S. Levitt,et al.  Uncertainty in artificial intelligence , 1988 .

[11]  Ryan P. Adams,et al.  Firefly Monte Carlo: Exact MCMC with Subsets of Data , 2014, UAI.

[12]  Robert Kohn,et al.  Subsampling MCMC - an Introduction for the Survey Statistician , 2018, Sankhya A.

[13]  Arnaud Doucet,et al.  On Markov chain Monte Carlo methods for tall data , 2015, J. Mach. Learn. Res..

[14]  Pushmeet Kohli,et al.  Just-In-Time Learning for Fast and Flexible Inference , 2014, NIPS.

[15]  R. Tarone,et al.  The Use of Historical Control Information in Testing for a Trend in Proportions , 1982 .

[16]  H. Robbins Asymptotically Subminimax Solutions of Compound Statistical Decision Problems , 1985 .

[17]  David Tolpin,et al.  Deployable probabilistic programming , 2019, Onward!.

[18]  Prabhat,et al.  Etalumis: bringing probabilistic programming to scientific simulators at scale , 2019, SC.