An Introduction to Probabilistic Programming

This document is designed to be a first-year graduate-level introduction to probabilistic programming. It not only provides a thorough background for anyone wishing to use a probabilistic programming system, but also introduces the techniques needed to design and build these systems. It is aimed at people who have an undergraduate-level understanding of either or, ideally, both probabilistic machine learning and programming languages. We start with a discussion of model-based reasoning and explain why conditioning as a foundational computation is central to the fields of probabilistic machine learning and artificial intelligence. We then introduce a simple first-order probabilistic programming language (PPL) whose programs define static-computation-graph, finite-variable-cardinality models. In the context of this restricted PPL we introduce fundamental inference algorithms and describe how they can be implemented in the context of models denoted by probabilistic programs. In the second part of this document, we introduce a higher-order probabilistic programming language, with a functionality analogous to that of established programming languages. This affords the opportunity to define models with dynamic computation graphs, at the cost of requiring inference methods that generate samples by repeatedly executing the program. Foundational inference algorithms for this kind of probabilistic programming language are explained in the context of an interface between program executions and an inference controller. This document closes with a chapter on advanced topics which we believe to be, at the time of writing, interesting directions for probabilistic programming research; directions that point towards a tight integration with deep neural network research and the development of systems for next-generation artificial intelligence applications.

[1]  J. Rosenthal,et al.  Adaptive Gibbs samplers and related MCMC methods , 2011, 1101.5838.

[2]  Geoffrey E. Hinton,et al.  The Helmholtz Machine , 1995, Neural Computation.

[3]  Prabhat,et al.  Efficient Probabilistic Inference in the Quest for Physics Beyond the Standard Model , 2018, NeurIPS.

[4]  Yee Whye Teh,et al.  Filtering Variational Objectives , 2017, NIPS.

[5]  Zheng Zhang,et al.  MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.

[6]  Ryan P. Adams,et al.  Structured VAEs: Composing Probabilistic Graphical Models and Variational Autoencoders , 2016 .

[7]  Joseph Tassarotti,et al.  Augur: Data-Parallel Probabilistic Modeling , 2014, NIPS.

[8]  Joshua B. Tenenbaum,et al.  Picture: A probabilistic programming language for scene perception , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Dustin Tran,et al.  Edward: A library for probabilistic modeling, inference, and criticism , 2016, ArXiv.

[10]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[11]  Pierre Del Moral,et al.  Biips: Software for Bayesian Inference with Interacting Particle Systems , 2014, 1412.3779.

[12]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[13]  Ohad Kammar,et al.  A convenient category for higher-order probability theory , 2017, 2017 32nd Annual ACM/IEEE Symposium on Logic in Computer Science (LICS).

[14]  Lawrence M. Murray Bayesian State-Space Modelling on High-Performance Hardware Using LibBi , 2013, 1306.3277.

[15]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[16]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[17]  Joshua B. Tenenbaum,et al.  Church: a language for generative models , 2008, UAI.

[18]  Frank D. Wood,et al.  Discontinuous Hamiltonian Monte Carlo for Probabilistic Programs , 2018, ArXiv.

[19]  Martyn Plummer,et al.  JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling , 2003 .

[20]  Eric Moulines,et al.  Adaptive sequential Monte Carlo by means of mixture of experts , 2011, Stat. Comput..

[21]  Michael I. Jordan,et al.  Learning Programs: A Hierarchical Bayesian Approach , 2010, ICML.

[22]  Luc De Raedt,et al.  Probabilistic logic programs: Unifying program trace and possible world semantics , 2017 .

[23]  Charles Kemp,et al.  How to Grow a Mind: Statistics, Structure, and Abstraction , 2011, Science.

[24]  A. Pfeffer,et al.  Figaro : An Object-Oriented Probabilistic Programming Language , 2009 .

[25]  Frank D. Wood,et al.  Learning Disentangled Representations with Semi-Supervised Deep Generative Models , 2017, NIPS.

[26]  Yura N. Perov,et al.  Automatic Sampler Discovery via Probabilistic Programming and Approximate Bayesian Computation , 2016, AGI.

[27]  Sam Staton,et al.  Commutative Semantics for Probabilistic Programming , 2017, ESOP.

[28]  John Salvatier,et al.  Probabilistic programming in Python using PyMC3 , 2016, PeerJ Comput. Sci..

[29]  Pierre Del Moral,et al.  Sequential Monte Carlo with Highly Informative Observations , 2014, SIAM/ASA J. Uncertain. Quantification.

[30]  Frank D. Wood,et al.  A New Approach to Probabilistic Programming Inference , 2014, AISTATS.

[31]  Luc De Raedt,et al.  On the implementation of the probabilistic logic programming language ProbLog , 2010, Theory and Practice of Logic Programming.

[32]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[33]  Zoubin Ghahramani,et al.  Probabilistic machine learning and artificial intelligence , 2015, Nature.

[34]  Eric Moulines,et al.  Adaptive methods for sequential importance sampling with application to state space models , 2008, 2008 16th European Signal Processing Conference.

[35]  Noah D. Goodman,et al.  Inducing Probabilistic Programs by Bayesian Program Merging , 2011, ArXiv.

[36]  Pieter Abbeel,et al.  Gradient Estimation Using Stochastic Computation Graphs , 2015, NIPS.

[37]  Andrew D. Gordon,et al.  Measure Transformer Semantics for Bayesian Machine Learning , 2011, ESOP.

[38]  Sean Gerrish,et al.  Black Box Variational Inference , 2013, AISTATS.

[39]  Radford M. Neal Probabilistic Inference Using Markov Chain Monte Carlo Methods , 2011 .

[40]  Hongseok Yang,et al.  On Nesting Monte Carlo Estimators , 2017, ICML.

[41]  Frank D. Wood,et al.  Inference Compilation and Universal Probabilistic Programming , 2016, AISTATS.

[42]  Frank D. Wood,et al.  A Compilation Target for Probabilistic Programming Languages , 2014, ICML.

[43]  Ohad Kammar,et al.  Semantics for probabilistic programming: higher-order functions, continuous distributions, and soft constraints , 2016, 2016 31st Annual ACM/IEEE Symposium on Logic in Computer Science (LICS).

[44]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[45]  Noah D. Goodman,et al.  C3: Lightweight Incrementalized MCMC for Probabilistic Programs using Continuations and Callsite Caching , 2015, AISTATS.

[46]  Noah D. Goodman,et al.  Learning Stochastic Inverses , 2013, NIPS.

[47]  Andrew Zisserman,et al.  Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition , 2014, ArXiv.

[48]  Patrice Y. Simard,et al.  Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[49]  Zoubin Ghahramani,et al.  Turing: A Language for Flexible Probabilistic Inference , 2018 .

[50]  Fabrizio Riguzzi,et al.  Probabilistic Logical Inference on the Web , 2016, AI*IA.

[51]  Chung-Kil Hur,et al.  R2: An Efficient MCMC Sampler for Probabilistic Programs , 2014, AAAI.

[52]  Walter R. Gilks,et al.  BUGS - Bayesian inference Using Gibbs Sampling Version 0.50 , 1995 .

[53]  Patrick Shafto,et al.  BayesDB: A probabilistic programming system for querying the probable implications of data , 2015, ArXiv.

[54]  Timon Gehr,et al.  PSI: Exact Symbolic Inference for Probabilistic Programs , 2016, CAV.

[55]  Andrew Zisserman,et al.  Reading Text in the Wild with Convolutional Neural Networks , 2014, International Journal of Computer Vision.

[56]  Noah D. Goodman,et al.  Amortized Inference in Probabilistic Reasoning , 2014, CogSci.

[57]  Joshua B. Tenenbaum,et al.  Approximate Bayesian Image Interpretation using Generative Probabilistic Graphics Programs , 2013, NIPS.

[58]  Barak A. Pearlmutter,et al.  Automatic differentiation in machine learning: a survey , 2015, J. Mach. Learn. Res..

[59]  Anthony Lee,et al.  On the role of interaction in sequential Monte Carlo algorithms , 2013, 1309.2918.

[60]  Amit Agarwal,et al.  CNTK: Microsoft's Open-Source Deep-Learning Toolkit , 2016, KDD.

[61]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[62]  Cameron Davidson-Pilon,et al.  Bayesian Methods for Hackers: Probabilistic Programming and Bayesian Inference , 2015 .

[63]  Yura N. Perov,et al.  Venture: a higher-order probabilistic programming platform with programmable inference , 2014, ArXiv.

[64]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[65]  Gerald J. Sussman,et al.  Structure and interpretation of computer programs , 1985, Proceedings of the IEEE.

[66]  Joshua B. Tenenbaum,et al.  Deep Convolutional Inverse Graphics Network , 2015, NIPS.

[67]  David Broman,et al.  Delayed Sampling and Automatic Rao-Blackwellization of Probabilistic Programs , 2017, AISTATS.

[68]  David Wingate,et al.  Automated Variational Inference in Probabilistic Programming , 2013, ArXiv.

[69]  C. Cordell Green,et al.  What Is Program Synthesis? , 1985, J. Autom. Reason..

[70]  David Tolpin,et al.  Design and Implementation of Probabilistic Programming Language Anglican , 2016, IFL 2016.

[71]  Fredrik Lindsten,et al.  Nested Sequential Monte Carlo Methods , 2015, ICML.

[72]  John C. Mitchell,et al.  The End is Nigh: Generic Solving of Text-based CAPTCHAs , 2014, WOOT.

[73]  Noah D. Goodman,et al.  Reasoning about reasoning by nested conditioning: Modeling theory of mind with probabilistic programs , 2014, Cognitive Systems Research.

[74]  A. Doucet,et al.  Particle Markov chain Monte Carlo methods , 2010 .

[75]  Avi Pfeffer,et al.  Practical Probabilistic Programming , 2016, ILP.

[76]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[77]  Thomas L. Griffiths,et al.  Adaptor Grammars: A Framework for Specifying Compositional Nonparametric Bayesian Models , 2006, NIPS.

[78]  Andrew McCallum,et al.  FACTORIE: Probabilistic Programming via Imperatively Defined Factor Graphs , 2009, NIPS.

[79]  Nando de Freitas,et al.  The Unscented Particle Filter , 2000, NIPS.

[80]  Jacques Carette,et al.  Probabilistic Inference by Program Transformation in Hakaru (System Description) , 2016, FLOPS.

[81]  Pat Hanrahan,et al.  Controlling procedural modeling programs with stochastically-ordered sequential Monte Carlo , 2015, ACM Trans. Graph..

[82]  Simon J. Godsill,et al.  On sequential Monte Carlo sampling methods for Bayesian filtering , 2000, Stat. Comput..

[83]  Ruslan Salakhutdinov,et al.  Importance Weighted Autoencoders , 2015, ICLR.

[84]  Scott W. Linderman,et al.  Variational Sequential Monte Carlo , 2017, AISTATS.

[85]  Pat Hanrahan,et al.  Generating Efficient MCMC Kernels from Probabilistic Programs , 2014, AISTATS.

[86]  David Tolpin,et al.  Black-Box Policy Search with Probabilistic Programs , 2015, AISTATS.

[87]  Sergey Levine,et al.  Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review , 2018, ArXiv.

[88]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[89]  Rich Hickey,et al.  The Clojure programming language , 2008, DLS '08.

[90]  Dustin Tran,et al.  Deep Probabilistic Programming , 2017, ICLR.

[91]  Noah D. Goodman,et al.  Lightweight Implementations of Probabilistic Programming Languages Via Transformational Compilation , 2011, AISTATS.

[92]  R. Aumann Borel structures for function spaces , 1961 .

[93]  David A. McAllester,et al.  Effective Bayesian Inference for Stochastic Programs , 1997, AAAI/IAAI.

[94]  Chris Okasaki,et al.  Purely functional data structures , 1998 .

[95]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[96]  Andrew W. Appel,et al.  Compiling with Continuations , 1991 .

[97]  Sebastian Thrun,et al.  Towards programming tools for robots that integrate probabilistic computation and learning , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[98]  Taisuke Sato,et al.  PRISM: A Language for Symbolic-Statistical Modeling , 1997, IJCAI.

[99]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[100]  Frank D. Wood,et al.  Inference Networks for Sequential Monte Carlo in Graphical Models , 2016, ICML.

[101]  H. Haario,et al.  An adaptive Metropolis algorithm , 2001 .

[102]  P. Deb Finite Mixture Models , 2008 .

[103]  Avi Pfeffer,et al.  IBAL: A Probabilistic Rational Programming Language , 2001, IJCAI.

[104]  Thomas A. Henzinger,et al.  Probabilistic programming , 2014, FOSE.

[105]  Tom Rainforth,et al.  Nesting Probabilistic Programs , 2018, UAI.

[106]  Ohad Kammar,et al.  Denotational validation of higher-order Bayesian inference , 2017, Proc. ACM Program. Lang..

[107]  Stuart J. Russell,et al.  BLOG: Probabilistic Models with Unknown Objects , 2005, IJCAI.