Blang: Bayesian declarative modelling of arbitrary data structures.

Consider a Bayesian inference problem where a variable of interest does not take values in a Euclidean space. These "non-standard" data structures are in reality fairly common. They are frequently used in problems involving latent discrete factor models, networks, and domain specific problems such as sequence alignments and reconstructions, pedigrees, and phylogenies. In principle, Bayesian inference should be particularly well-suited in such scenarios, as the Bayesian paradigm provides a principled way to obtain confidence assessment for random variables of any type. However, much of the recent work on making Bayesian analysis more accessible and computationally efficient has focused on inference in Euclidean spaces. In this paper, we introduce Blang, a domain specific language (DSL) and library aimed at bridging this gap. Blang allows users to perform Bayesian analysis on arbitrary data types while using a declarative syntax similar to BUGS. Blang is augmented with intuitive language additions to invent data types of the user's choosing. To perform inference at scale on such arbitrary state spaces, Blang leverages recent advances in parallelizable, non-reversible Markov chain Monte Carlo methods.

[1]  James M. Flegal,et al.  Batch means and spectral variance estimators in Markov chain Monte Carlo , 2008, 0811.1729.

[2]  E. Ising Beitrag zur Theorie des Ferromagnetismus , 1925 .

[3]  Frank D. Wood,et al.  A New Approach to Probabilistic Programming Inference , 2014, AISTATS.

[4]  J. Huelsenbeck,et al.  Efficiency of Markov chain Monte Carlo tree proposals in Bayesian phylogenetics. , 2008, Systematic biology.

[5]  C. Geyer Markov Chain Monte Carlo Maximum Likelihood , 1991 .

[6]  J. Burgess,et al.  ON THE FERMI-GBM EVENT 0.4 s AFTER GW150914 , 2016, 1606.00314.

[7]  Yan Zhou,et al.  Toward Automatic Model Comparison: An Adaptive Sequential Monte Carlo Approach , 2016 .

[8]  Jeff Friesen Java Threads and the Concurrency Utilities , 2015, Apress.

[9]  Nando de Freitas,et al.  Bayesian Analysis of Continuous Time Markov Chains with Application to Phylogenetic Modelling , 2016 .

[10]  Rob Hall,et al.  A Bayesian Approach to Graphical Record Linkage and Deduplication , 2016 .

[11]  Jiqiang Guo,et al.  Stan: A Probabilistic Programming Language. , 2017, Journal of statistical software.

[12]  Daniel M. Roy,et al.  On computability and disintegration , 2017, Math. Struct. Comput. Sci..

[13]  J LunnDavid,et al.  WinBUGS A Bayesian modelling framework , 2000 .

[14]  Michael J. Landis,et al.  RevBayes: Bayesian Phylogenetic Inference Using Graphical Models and an Interactive Model-Specification Language , 2016, Systematic biology.

[15]  Martyn Plummer,et al.  JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling , 2003 .

[16]  Frank D. Wood,et al.  A Compilation Target for Probabilistic Programming Languages , 2014, ICML.

[17]  Xiao-Li Meng,et al.  SIMULATING RATIOS OF NORMALIZING CONSTANTS VIA A SIMPLE IDENTITY: A THEORETICAL EXPLORATION , 1996 .

[18]  Andrew Thomas,et al.  WinBUGS - A Bayesian modelling framework: Concepts, structure, and extensibility , 2000, Stat. Comput..

[19]  Radford M. Neal Annealed importance sampling , 1998, Stat. Comput..

[20]  Brunero Liseo,et al.  A hierarchical Bayesian approach to record linkage and population size problems , 2010, 1011.2649.

[21]  Thomas B. Schön,et al.  Automated learning with a probabilistic programming language: Birch , 2018, Annu. Rev. Control..

[22]  Noah D. Goodman,et al.  Pyro: Deep Universal Probabilistic Programming , 2018, J. Mach. Learn. Res..

[23]  Arnaud Doucet,et al.  Non-Reversible Parallel Tempering: an Embarassingly Parallel MCMC Scheme , 2019 .

[24]  Y. Ogata A Monte Carlo method for high dimensional integration , 1989 .

[25]  Stuart J. Russell,et al.  BLOG: Probabilistic Models with Unknown Objects , 2005, IJCAI.

[26]  A. Kennedy,et al.  Hybrid Monte Carlo , 1988 .

[27]  P. Moral,et al.  Sequential Monte Carlo samplers , 2002, cond-mat/0212648.

[28]  Joshua B. Tenenbaum,et al.  Church: a language for generative models , 2008, UAI.

[29]  J. Greiner,et al.  ON THE FERMI GBM EVENT 0.4 SEC AFTER GW 150914 , 2016 .

[30]  Frank D. Wood,et al.  Inference Networks for Sequential Monte Carlo in Graphical Models , 2016, ICML.

[31]  T. J. Mitchell,et al.  Bayesian Variable Selection in Linear Regression , 1988 .

[32]  Radford M. Neal Slice Sampling , 2003, The Annals of Statistics.

[33]  Fabian J. Theis,et al.  Comprehensive benchmarking of Markov chain Monte Carlo methods for dynamical systems , 2017, BMC Systems Biology.

[34]  B. Carter The anthropic principle and its implications for biological evolution , 1983, Philosophical Transactions of the Royal Society of London. Series A, Mathematical and Physical Sciences.

[35]  Henrik Singmann,et al.  bridgesampling: An R Package for Estimating Normalizing Constants , 2017, Journal of Statistical Software.

[36]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[37]  Thomas V. Wiecki,et al.  Probabilistic Programming in Python using PyMC , 2015, 1507.08050.

[38]  Andrew Thomas,et al.  The BUGS project: Evolution, critique and future directions , 2009, Statistics in medicine.

[39]  Charles E. Leiserson,et al.  Deterministic parallel random-number generation for dynamic-multithreading platforms , 2012, PPoPP '12.

[40]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[41]  Chris T. Darimont,et al.  Quantifying Inter- and Intra-Population Niche Variability Using Hierarchical Bayesian Stable Isotope Mixing Models , 2009, PloS one.

[42]  Paul-Christian Bürkner,et al.  Ordinal Regression Models in Psychology: A Tutorial , 2019, Advances in Methods and Practices in Psychological Science.

[43]  J. Geweke,et al.  Getting It Right , 2004 .