Densities of Almost Surely Terminating Probabilistic Programs are Differentiable Almost Everywhere

We study the differential properties of higher-order statistical probabilistic programs with recursion and conditioning. Our starting point is an open problem posed by Hongseok Yang: what class of statistical probabilistic programs have densities that are differentiable almost everywhere? To formalise the problem, we consider Statistical PCF (SPCF), an extension of call-by-value PCF with real numbers, and constructs for sampling and conditioning. We give SPCF a sampling-style operational semantics à la Borgström et al., and study the associated weight (commonly referred to as the density) function and value function on the set of possible execution traces. Our main result is that almost surely terminating SPCF programs, generated from a set of primitive functions (e.g. the set of analytic functions) satisfying mild closure properties, have weight and value functions that are almost everywhere differentiable. We use a stochastic form of symbolic execution to reason about almost everywhere differentiability. A by-product of this work is that almost surely terminating deterministic (S)PCF programs with real parameters denote functions that are almost everywhere differentiable. Our result is of practical interest, as almost everywhere differentiability of the density function is required to hold for the correctness of major gradient-based inference algorithms.

[1]  Hugo Paquet,et al.  Probabilistic Programming Inference via Intensional Semantics , 2019, ESOP.

[2]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[3]  Vikash K. Mansinghka,et al.  Gen: a general-purpose probabilistic programming system with programmable inference , 2019, PLDI.

[4]  Claudio V. Russo,et al.  Deriving Probability Density Functions from Probabilistic Functional Programs , 2017, Log. Methods Comput. Sci..

[5]  Andrew Gelman,et al.  Automatic Variational Inference in Stan , 2015, NIPS.

[6]  Ugo Dal Lago,et al.  A lambda-calculus foundation for universal probabilistic programming , 2015, ICFP.

[7]  Hongseok Yang,et al.  On Correctness of Automatic Differentiation for Non-Differentiable Functions , 2020, NeurIPS.

[8]  Noah D. Goodman,et al.  Lightweight Implementations of Probabilistic Programming Languages Via Transformational Compilation , 2011, AISTATS.

[9]  Kurt Sieber,et al.  Relating Full Abstraction Results for Different Programming Languages , 1990, FSTTCS.

[10]  Miguel Lázaro-Gredilla,et al.  Doubly Stochastic Variational Bayes for non-Conjugate Inference , 2014, ICML.

[11]  W. Rudin Principles of mathematical analysis , 1964 .

[12]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[13]  Timon Gehr,et al.  Fine-Grained Semantics for Probabilistic Programs , 2018, ESOP.

[14]  Ohad Kammar,et al.  A domain theory for statistical probabilistic programming , 2018, Proc. ACM Program. Lang..

[15]  Chung-chieh Shan,et al.  Deriving a probability density calculator (functional pearl) , 2016, ICFP.

[16]  Ohad Kammar,et al.  Denotational validation of higher-order Bayesian inference , 2017, Proc. ACM Program. Lang..

[17]  Hongseok Yang Some Semantic Issues in Probabilistic Programming Languages (Invited Talk) , 2019, FSCD.

[18]  PRAVEEN NARAYANAN,et al.  Symbolic Disintegration with a Variety of Base Measures , 2020, ACM Trans. Program. Lang. Syst..

[19]  S. Duane,et al.  Hybrid Monte Carlo , 1987 .

[20]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[21]  Hongseok Yang,et al.  Towards verified stochastic variational inference for probabilistic programs , 2019, Proc. ACM Program. Lang..

[22]  Ugo Dal Lago,et al.  On the Versatility of Open Logical Relations , 2020, ESOP.

[23]  Noah D. Goodman,et al.  Pyro: Deep Universal Probabilistic Programming , 2018, J. Mach. Learn. Res..

[24]  Damiano Mazza,et al.  Automatic differentiation in PCF , 2020, Proc. ACM Program. Lang..

[25]  James C. King,et al.  Symbolic execution and program testing , 1976, CACM.

[26]  Frank D. Wood,et al.  LF-PPL: A Low-Level First Order Probabilistic Programming Language for Non-Differentiable Models , 2019, AISTATS.

[27]  Thomas Ehrhard,et al.  Measurable cones and stable, measurable functions: a model for probabilistic higher-order programming , 2017, Proc. ACM Program. Lang..

[28]  Dexter Kozen Semantics of Probabilistic Programs , 1979, FOCS.

[29]  Ryan Culpepper,et al.  Contextual Equivalence for Probabilistic Programs with Continuous Random Variables and Scoring , 2017, ESOP.

[30]  John M. Lee Introduction to Smooth Manifolds , 2002 .

[31]  Edouard Pauwels,et al.  A mathematical model for automatic differentiation in machine learning , 2020, NeurIPS.

[32]  L. Tu An introduction to manifolds , 2007 .

[33]  Joost-Pieter Katoen,et al.  On the hardness of analyzing probabilistic programs , 2018, Acta Informatica.

[34]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[35]  J. Conway Functions of One Complex Variable II , 1978 .

[36]  Laurent Regnier,et al.  The differential lambda-calculus , 2003, Theor. Comput. Sci..

[37]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[38]  N. Saheb-Djahromi,et al.  Probabilistic LCF , 1978, International Symposium on Mathematical Foundations of Computer Science.

[39]  Loring W. Tu,et al.  An introduction to manifolds , 2007 .

[40]  B. Mityagin The Zero Set of a Real Analytic Function , 2015, Mathematical Notes.

[41]  Damiano Mazza,et al.  Backpropagation in the simply typed lambda-calculus with linear negation , 2020, Proc. ACM Program. Lang..

[42]  Vikash K. Mansinghka,et al.  Trace types and denotational semantics for sound programmable inference in probabilistic languages , 2019, Proc. ACM Program. Lang..

[43]  Dana S. Scott,et al.  A Type-Theoretical Alternative to ISWIM, CUCH, OWHY , 1993, Theor. Comput. Sci..

[44]  Chung-Kil Hur,et al.  A Provably Correct Sampler for Probabilistic Programs , 2015, FSTTCS.

[45]  John M. Lee Manifolds and Differential Geometry , 2009 .

[46]  Sam Staton,et al.  Correctness of Automatic Differentiation via Diffeologies and Categorical Gluing , 2020, FoSSaCS.

[47]  Theophilos Giannakopoulos,et al.  Contextual equivalence for a probabilistic language with continuous random variables and recursion , 2018, Proc. ACM Program. Lang..

[48]  Alexander G. Gray,et al.  A type theory for probability density functions , 2012, POPL '12.

[49]  Sam Staton,et al.  Commutative Semantics for Probabilistic Programming , 2017, ESOP.

[50]  Hongseok Yang,et al.  Reparameterization Gradient for Non-differentiable Models , 2018, NeurIPS.

[51]  Sean Gerrish,et al.  Black Box Variational Inference , 2013, AISTATS.

[52]  Problems of the Lightweight Implementation of Probabilistic Programming , 2016 .

[53]  Lori A. Clarke,et al.  A System to Generate Test Data and Symbolically Execute Programs , 1976, IEEE Transactions on Software Engineering.

[54]  Yee Whye Teh,et al.  Divide, Conquer, and Combine: a New Inference Strategy for Probabilistic Programs with Stochastic Support , 2019, ICML.

[55]  D. Dunson,et al.  Discontinuous Hamiltonian Monte Carlo for discrete parameters and discontinuous likelihoods , 2017, 1705.08510.

[56]  Sriram K. Rajamani,et al.  Efficiently Sampling Probabilistic Programs via Program Analysis , 2013, AISTATS.