Measure Transformer Semantics for Bayesian Machine Learning

The Bayesian approach to machine learning amounts to inferring posterior distributions of random variables from a probabilistic model of how the variables are related (that is, a prior distribution) and a set of observations of variables. There is a trend in machine learning towards expressing Bayesian models as probabilistic programs. As a foundation for this kind of programming, we propose a core functional calculus with primitives for sampling prior distributions and observing variables. We define combinators for measure transformers, based on theorems in measure theory, and use these to give a rigorous semantics to our core calculus. The original features of our semantics include its support for discrete, continuous, and hybrid measures, and, in particular, for observations of zero-probability events. We compile our core language to a small imperative language that has a straightforward semantics via factor graphs, data structures that enable many efficient inference algorithms. We use an existing inference engine for efficient approximate inference of posterior marginal distributions, treating thousands of observations per second for large instances of realistic models.

[1]  Annabelle McIver,et al.  Abstraction, Refinement And Proof For Probabilistic Systems (Monographs in Computer Science) , 2004 .

[2]  Benjamin C. Pierce,et al.  Distance makes the types grow stronger: a calculus for differential privacy , 2010, ICFP '10.

[3]  Martin Erwig,et al.  Functional Pearls: Probabilistic functional programming in Haskell , 2006, J. Funct. Program..

[4]  Walter R. Gilks,et al.  A Language and Program for Complex Bayesian Modelling , 1994 .

[5]  Norman Ramsey,et al.  Stochastic lambda calculus and monads of probability distributions , 2002, POPL '02.

[6]  Sebastian Thrun,et al.  A probabilistic language based upon sampling functions , 2005, POPL '05.

[7]  Joshua B. Tenenbaum,et al.  Church: a language for generative models , 2008, UAI.

[8]  Brendan J. Frey,et al.  Factor graphs and the sum-product algorithm , 2001, IEEE Trans. Inf. Theory.

[9]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[10]  Benjamin Grégoire,et al.  Formal certification of code-based cryptographic proofs , 2009, POPL '09.

[11]  Marta Z. Kwiatkowska,et al.  Quantitative Analysis With the Probabilistic Model Checker PRISM , 2006, QAPL.

[12]  Frank McSherry,et al.  Privacy integrated queries: an extensible platform for privacy-preserving data analysis , 2009, SIGMOD Conference.

[13]  Christopher Ré,et al.  Probabilistic databases: diamonds in the dirt , 2009, CACM.

[14]  Prakash Panangaden,et al.  Labelled Markov Processes , 2009 .

[15]  Martín Abadi,et al.  Reconciling Two Views of Cryptography (The Computational Soundness of Formal Encryption) , 2007, Journal of Cryptology.

[16]  Christine Paulin-Mohring,et al.  Proofs of randomized algorithms in Coq , 2006, Sci. Comput. Program..

[17]  Thomas P. Minka,et al.  Gates , 2008, NIPS.

[18]  Claudio V. Russo,et al.  A model-learner pattern for bayesian reasoning , 2013, POPL.

[19]  Tom Minka,et al.  Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[20]  Andrew McCallum,et al.  FACTORIE: Probabilistic Programming via Imperatively Defined Factor Graphs , 2009, NIPS.

[21]  Charles M. Bishop,et al.  Variational Message Passing , 2005, J. Mach. Learn. Res..

[22]  Avi Pfeffer,et al.  Practical Probabilistic Programming , 2016, ILP.

[23]  Matthew Richardson,et al.  Markov Logic , 2008, Probabilistic Inductive Logic Programming.

[24]  Avi Pfeffer,et al.  The Design and Implementation of IBAL: A General-Purpose Probabilistic Language , 2005 .

[25]  J. Rosenthal A First Look at Rigorous Probability Theory , 2000 .

[26]  Chung-chieh Shan,et al.  Embedded Probabilistic Programming , 2009, DSL.

[27]  Keith Bonawitz,et al.  Composable probabilistic inference with BLAISE , 2008 .

[28]  Annabelle McIver,et al.  Abstraction, Refinement and Proof for Probabilistic Systems , 2004, Monographs in Computer Science.

[29]  Chung-chieh Shan,et al.  Monolingual Probabilistic Programming Using Generalized Coroutines , 2009, UAI.

[30]  Bernd Fischer,et al.  AutoBayes Program Synthesis System Users Manual , 2008 .

[31]  Avi Pfeffer,et al.  IBAL: A Probabilistic Rational Programming Language , 2001, IJCAI.

[32]  Tom Minka,et al.  TrueSkillTM: A Bayesian Skill Rating System , 2006, NIPS.

[33]  Alexander G. Gray,et al.  A type theory for probability density functions , 2012, POPL '12.

[34]  C. Robert,et al.  Bayesian Modeling Using WinBUGS , 2009 .

[35]  Dexter Kozen,et al.  Semantics of probabilistic programs , 1979, 20th Annual Symposium on Foundations of Computer Science (sfcs 1979).

[36]  Sofiène Tahar,et al.  On the Formalization of the Lebesgue Integration Theory in HOL , 2010, ITP.

[37]  Joe Hurd,et al.  Formal verification of probabilistic algorithms , 2003 .

[38]  Edwin Thompson Jaynes,et al.  Probability theory , 2003 .

[39]  C. Jones,et al.  A probabilistic powerdomain of evaluations , 1989, [1989] Proceedings. Fourth Annual Symposium on Logic in Computer Science.

[40]  Stuart J. Russell,et al.  BLOG: Probabilistic Models with Unknown Objects , 2005, IJCAI.

[41]  David A. McAllester,et al.  Effective Bayesian Inference for Stochastic Programs , 1997, AAAI/IAAI.

[42]  Alexey Radul,et al.  Report on the probabilistic language scheme , 2007, DLS '07.

[43]  Daniel M. Roy,et al.  Noncomputable Conditional Distributions , 2011, 2011 IEEE 26th Annual Symposium on Logic in Computer Science.

[44]  Stuart J. Russell,et al.  Probabilistic models with unknown objects , 2006 .

[45]  P. Billingsley,et al.  Probability and Measure , 1980 .

[46]  Radha Jagadeesan,et al.  Stochastic processes as concurrent constraint programs , 1999, POPL '99.

[47]  Gavin Lowe,et al.  Quantifying information flow , 2002, Proceedings 15th IEEE Computer Security Foundations Workshop. CSFW-15.

[48]  Joaquin Quiñonero Candela,et al.  Web-Scale Bayesian Click-Through rate Prediction for Sponsored Search Advertising in Microsoft's Bing Search Engine , 2010, ICML.