Densities of almost-surely terminating probabilistic programs are differentiable almost everywhere

We study the differential properties of higher-order statistical probabilistic programs with recursion and conditioning. Our starting point is an open problem posed by Hongseok Yang: what class of statistical probabilistic programs have densities that are differentiable almost everywhere? To formalise the problem, we consider Statistical PCF (SPCF), an extension of call-by-value PCF with real numbers, and constructs for sampling and conditioning. We give SPCF a sampling-style operational semantics a la Borgstrom et al., and study the associated weight (commonly referred to as the density) function and value function on the set of possible execution traces. Our main result is that almost-surely terminating SPCF programs, generated from a set of primitive functions (e.g. the set of analytic functions) satisfying mild closure properties, have weight and value functions that are almost-everywhere differentiable. We use a stochastic form of symbolic execution to reason about almost-everywhere differentiability. A by-product of this work is that almost-surely terminating deterministic (S)PCF programs with real parameters denote functions that are almost-everywhere differentiable. Our result is of practical interest, as almost-everywhere differentiability of the density function is required to hold for the correctness of major gradient-based inference algorithms.

[1]  Dexter Kozen,et al.  Semantics of probabilistic programs , 1979, 20th Annual Symposium on Foundations of Computer Science (sfcs 1979).

[2]  Chung-chieh Shan,et al.  Deriving a probability density calculator (functional pearl) , 2016, ICFP.

[3]  Theophilos Giannakopoulos,et al.  Contextual equivalence for a probabilistic language with continuous random variables and recursion , 2018, Proc. ACM Program. Lang..

[4]  Chung-Kil Hur,et al.  A Provably Correct Sampler for Probabilistic Programs , 2015, FSTTCS.

[5]  Michael Carbin,et al.  Trace types and denotational semantics for sound programmable inference in probabilistic languages , 2020, Proc. ACM Program. Lang..

[6]  Vikash K. Mansinghka,et al.  Gen: a general-purpose probabilistic programming system with programmable inference , 2019, PLDI.

[7]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[8]  Lori A. Clarke,et al.  A System to Generate Test Data and Symbolically Execute Programs , 1976, IEEE Transactions on Software Engineering.

[9]  John M. Lee Manifolds and Differential Geometry , 2009 .

[10]  Sam Staton,et al.  Commutative Semantics for Probabilistic Programming , 2017, ESOP.

[11]  Andrew Gelman,et al.  Automatic Variational Inference in Stan , 2015, NIPS.

[12]  G. Pólya,et al.  Functions of One Complex Variable , 1998 .

[13]  Thomas Ehrhard,et al.  Measurable cones and stable, measurable functions: a model for probabilistic higher-order programming , 2017, Proc. ACM Program. Lang..

[14]  James C. King,et al.  Symbolic execution and program testing , 1976, CACM.

[15]  Noah D. Goodman,et al.  Pyro: Deep Universal Probabilistic Programming , 2018, J. Mach. Learn. Res..

[16]  Noah D. Goodman,et al.  Lightweight Implementations of Probabilistic Programming Languages Via Transformational Compilation , 2011, AISTATS.

[17]  N. Saheb-Djahromi,et al.  Probabilistic LCF , 1978, International Symposium on Mathematical Foundations of Computer Science.

[18]  Hongseok Yang Some Semantic Issues in Probabilistic Programming Languages (Invited Talk) , 2019, FSCD.

[19]  B. Mityagin The Zero Set of a Real Analytic Function , 2015, Mathematical Notes.

[20]  Dana S. Scott,et al.  A Type-Theoretical Alternative to ISWIM, CUCH, OWHY , 1993, Theor. Comput. Sci..

[21]  Alexander G. Gray,et al.  A type theory for probability density functions , 2012, POPL '12.

[22]  Laurent Regnier,et al.  The differential lambda-calculus , 2003, Theor. Comput. Sci..

[23]  Hongseok Yang,et al.  Towards verified stochastic variational inference for probabilistic programs , 2019, Proc. ACM Program. Lang..

[24]  W. Rudin Principles of mathematical analysis , 1964 .

[25]  John M. Lee Introduction to Smooth Manifolds , 2002 .

[26]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[27]  Hugo Paquet,et al.  Probabilistic Programming Inference via Intensional Semantics , 2019, ESOP.

[28]  Timon Gehr,et al.  Fine-Grained Semantics for Probabilistic Programs , 2018, ESOP.

[29]  Frank D. Wood,et al.  LF-PPL: A Low-Level First Order Probabilistic Programming Language for Non-Differentiable Models , 2019, AISTATS.

[30]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[31]  D. Dunson,et al.  Discontinuous Hamiltonian Monte Carlo for discrete parameters and discontinuous likelihoods , 2017, 1705.08510.

[32]  Claudio V. Russo,et al.  Deriving Probability Density Functions from Probabilistic Functional Programs , 2017, Log. Methods Comput. Sci..

[33]  Ohad Kammar,et al.  Denotational validation of higher-order Bayesian inference , 2017, Proc. ACM Program. Lang..

[34]  Ugo Dal Lago,et al.  A lambda-calculus foundation for universal probabilistic programming , 2015, ICFP.

[35]  Sean Gerrish,et al.  Black Box Variational Inference , 2013, AISTATS.

[36]  S. Duane,et al.  Hybrid Monte Carlo , 1987 .

[37]  Kurt Sieber,et al.  Relating Full Abstraction Results for Different Programming Languages , 1990, FSTTCS.

[38]  Ohad Kammar,et al.  A domain theory for statistical probabilistic programming , 2018, Proc. ACM Program. Lang..

[39]  Loring W. Tu,et al.  An introduction to manifolds , 2007 .

[40]  Joost-Pieter Katoen,et al.  On the hardness of analyzing probabilistic programs , 2018, Acta Informatica.