Probabilistic Data Analysis with Probabilistic Programming

Probabilistic techniques are central to data analysis, but different approaches can be difficult to apply, combine, and compare. This paper introduces composable generative population models (CGPMs), a computational abstraction that extends directed graphical models and can be used to describe and compose a broad class of probabilistic data analysis techniques. Examples include hierarchical Bayesian models, multivariate kernel methods, discriminative machine learning, clustering algorithms, dimensionality reduction, and arbitrary probabilistic programs. We also demonstrate the integration of CGPMs into BayesDB, a probabilistic programming platform that can express data analysis tasks using a modeling language and a structured query language. The practical value is illustrated in two ways. First, CGPMs are used in an analysis that identifies satellite data records which probably violate Kepler's Third Law, by composing causal probabilistic programs with non-parametric Bayes in under 50 lines of probabilistic code. Second, for several representative data analysis tasks, we report on lines of code and accuracy measurements of various CGPMs, plus comparisons with standard baseline solutions from Python and MATLAB libraries.

[1]  Arthur Gretton,et al.  Consistent Nonparametric Tests of Independence , 2010, J. Mach. Learn. Res..

[2]  Thomas L. Griffiths,et al.  Learning Systems of Concepts with an Infinite Relational Model , 2006, AAAI.

[3]  Q. Lib,et al.  Nonparametric estimation of regression functions with both categorical and continuous data , 2004 .

[4]  D. Aldous Exchangeability and related topics , 1985 .

[5]  M. Tanner,et al.  Facilitating the Gibbs Sampler: The Gibbs Stopper and the Griddy-Gibbs Sampler , 1992 .

[6]  J. Aitchison,et al.  Multivariate binary discrimination by the kernel method , 1976 .

[7]  Joshua B. Tenenbaum,et al.  CrossCat: A Fully Bayesian Nonparametric Method for Analyzing Heterogeneous, High Dimensional Data , 2015, J. Mach. Learn. Res..

[8]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[9]  Jiqiang Guo,et al.  Stan: A Probabilistic Programming Language. , 2017, Journal of statistical software.

[10]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[11]  J. Pitman Combinatorial Stochastic Processes , 2006 .

[12]  Sudipto Banerjee,et al.  Bayesian Linear Model : Gory Details , 2008 .

[13]  Warren B. Powell,et al.  Dirichlet Process Mixtures of Generalized Linear Models , 2009, J. Mach. Learn. Res..

[14]  C. Antoniak Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems , 1974 .

[15]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[16]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[17]  L. Skovgaard NONLINEAR MODELS FOR REPEATED MEASUREMENT DATA. , 1996 .

[18]  T. Snijders,et al.  Estimation and Prediction for Stochastic Blockstructures , 2001 .

[19]  Claudio V. Russo,et al.  Tabular: a schema-driven probabilistic programming language , 2014, POPL.

[20]  Neil D. Lawrence,et al.  Deep Gaussian Processes , 2012, AISTATS.

[21]  Daniel Fink A Compendium of Conjugate Priors , 1997 .

[22]  Josef Guttmann,et al.  Bayesian Inference in Factor Analysis -- Revised , 1973 .

[23]  D. Koller,et al.  2 Graphical Models in a Nutshell , 2008 .

[24]  Leonhard Held,et al.  Gaussian Markov Random Fields: Theory and Applications , 2005 .

[25]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[26]  Ben Taskar,et al.  Graphical Models in a Nutshell , 2007 .

[27]  Luc Devroye,et al.  Sample-based non-uniform random variate generation , 1986, WSC '86.

[28]  Ravindra Khattree,et al.  Multivariate Data Reduction and Discrimination With SAS® Software , 2001 .

[29]  A. Kraskov,et al.  Estimating mutual information. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[30]  Le Song,et al.  A Kernel Statistical Test of Independence , 2007, NIPS.

[31]  Radford M. Neal Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[32]  Moon,et al.  Estimation of mutual information using kernel density estimators. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[33]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[34]  Volker Tresp,et al.  Mixtures of Gaussian Processes , 2000, NIPS.

[35]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[36]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[37]  Arthur Gretton,et al.  Nonparametric Independence Tests: Space Partitioning and Kernel Approaches , 2008, ALT.

[38]  Carl E. Rasmussen,et al.  Infinite Mixtures of Gaussian Process Experts , 2001, NIPS.

[39]  Patrick Shafto,et al.  BayesDB: A probabilistic programming system for querying the probable implications of data , 2015, ArXiv.

[40]  Geoffrey E. Hinton,et al.  The EM algorithm for mixtures of factor analyzers , 1996 .

[41]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[42]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.

[43]  J. Sethuraman A CONSTRUCTIVE DEFINITION OF DIRICHLET PRIORS , 1991 .