Data-Driven Synthesis of Full Probabilistic Programs

Probabilistic programming languages (PPLs) provide users a clean syntax for concisely representing probabilistic processes and easy access to sophisticated built-in inference algorithms. Unfortunately, writing a PPL program by hand can be difficult for non-experts, requiring extensive knowledge of statistics and deep insights into the data. To make the modeling process easier, we have created a tool that synthesizes PPL programs from relational datasets. Our synthesizer leverages the input data to generate a program sketch, then applies simulated annealing to complete the sketch. We introduce a data-guided approach to the program mutation stage of simulated annealing; this innovation allows our tool to scale to synthesizing complete probabilistic programs from scratch. We find that our synthesizer produces accurate programs from 10,000-row datasets in 21 s on average.

[1]  Peter J. Woolf,et al.  Python Environment for Bayesian Learning: Inferring the Structure of Bayesian Networks from Knowledge and Data , 2009, J. Mach. Learn. Res..

[2]  Claudio V. Russo,et al.  Deriving Probability Density Functions from Probabilistic Functional Programs , 2013, TACAS.

[3]  Sanjit A. Seshia,et al.  Combinatorial sketching for finite programs , 2006, ASPLOS XII.

[4]  Rajeev Alur,et al.  Syntax-guided synthesis , 2013, 2013 Formal Methods in Computer-Aided Design.

[5]  Sumit Gulwani,et al.  From relational verification to SIMD loop synthesis , 2013, PPoPP '13.

[6]  Stuart J. Russell,et al.  BLOG: Probabilistic Models with Unknown Objects , 2005, IJCAI.

[7]  Pedro M. Domingos,et al.  Sum-product networks: A new deep architecture , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[8]  Rick Wicklin An Analysis of Airline Delays with SAS / IML r © Studio , 2009 .

[9]  Yura N. Perov,et al.  Learning Probabilistic Programs , 2014, ArXiv.

[10]  L. Williams,et al.  Contents , 2020, Ophthalmology (Rochester, Minn.).

[11]  Rajeev Alur,et al.  TRANSIT: specifying protocols with concolic snippets , 2013, PLDI.

[12]  Hod Lipson,et al.  Distilling Free-Form Natural Laws from Experimental Data , 2009, Science.

[13]  Chung-Kil Hur,et al.  R2: An Efficient MCMC Sampler for Probabilistic Programs , 2014, AAAI.

[14]  Serafín Moral,et al.  Mixtures of Truncated Exponentials in Hybrid Bayesian Networks , 2001, ECSQARU.

[15]  Walter R. Gilks,et al.  A Language and Program for Complex Bayesian Modelling , 1994 .

[16]  Sriram K. Rajamani,et al.  Efficient synthesis of probabilistic programs , 2015, PLDI.

[17]  John R. Woodward,et al.  Why evolution is not a good paradigm for program induction: a critique of genetic programming , 2009, GEC '09.

[18]  Muriel Médard,et al.  Network deconvolution as a general method to distinguish direct dependencies in networks , 2013, Nature Biotechnology.

[19]  Emina Torlak,et al.  A lightweight symbolic virtual machine for solver-aided host languages , 2014, PLDI.

[20]  David Heckerman,et al.  A Tutorial on Learning with Bayesian Networks , 1998, Learning in Graphical Models.

[21]  Alexander Aiken,et al.  Stochastic superoptimization , 2012, ASPLOS '13.

[22]  Joshua B. Tenenbaum,et al.  Church: a language for generative models , 2008, UAI.

[23]  Adam Loy,et al.  Delayed, Canceled, on Time, Boarding… Flying in the USA , 2011 .

[24]  David A. McAllester,et al.  Effective Bayesian Inference for Stochastic Programs , 1997, AAAI/IAAI.

[25]  Man Leung Wong,et al.  Evolutionary Program Induction Directed by Logic Grammars , 1997, Evolutionary Computation.

[26]  Riccardo Poli,et al.  Free lunches for function and program induction , 2009, FOGA '09.

[27]  Pedro M. Domingos,et al.  Learning the Structure of Sum-Product Networks , 2013, ICML.

[28]  U. Berkeley,et al.  Swift : Compiled Inference for Probabilistic Programs , 2015 .

[29]  Jonathan Lawry,et al.  Symbolic and Quantitative Approaches to Reasoning with Uncertainty , 2009 .

[30]  Gregory F. Cooper,et al.  Discovery of Causal Relationships in a Gene-Regulation Pathway from a Mixture of Experimental and Observational DNA Microarray Data , 2001, Pacific Symposium on Biocomputing.

[31]  Takuya Akiba,et al.  Calibrating Research in Program Synthesis Using 72,000 Hours of Programmer Time , 2013 .

[32]  Rafael Rumí,et al.  Learning hybrid Bayesian networks using mixtures of truncated exponentials , 2006, Int. J. Approx. Reason..

[33]  Pedro M. Domingos,et al.  Learning Arithmetic Circuits , 2008, UAI.

[34]  Emina Torlak,et al.  Optimizing synthesis with metasketches , 2016, POPL.

[35]  Daphne Koller,et al.  Nonuniform Dynamic Discretization in Hybrid Networks , 1997, UAI.

[36]  Sumit Gulwani,et al.  Synthesis of loop-free programs , 2011, PLDI '11.

[37]  Thomas A. Henzinger,et al.  Probabilistic programming , 2014, FOSE.

[38]  Dinakar Dhurjati,et al.  Scaling up Superoptimization , 2016 .

[39]  Stuart J. Russell,et al.  Automatic Inference in BLOG , 2010, StarAI@AAAI.