Detecting Dependencies in High-Dimensional, Sparse Databases Using Probabilistic Programming and Non-parametric Bayes

Sparse databases with hundreds of variables are commonplace. In this setting, it is both statistically and computationally challenging to detect true predictive relationships between variables and also to suppress false positives. This paper proposes a new approach to dependency detection that combines probabilistic programming, information theory, and non-parametric Bayesian modeling. The key ideas are to (i) build an ensemble of joint probability models for the whole database via approximate posterior inference in CrossCat, a non-parametric factorial mixture; (ii) identify independencies by analyzing model structures; and (iii) report the distribution on conditional mutual information induced by posterior uncertainty over the ensemble of models. This paper presents experiments showing that the approach finds relationships that pairwise correlation misses, including context-specific independencies, on databases of mathematics exam scores and global indicators of macroeconomic development.

[1]  R. Moddemeijer On estimation of entropy and mutual information of continuous distributions , 1989 .

[2]  Hoon Kim,et al.  Monte Carlo Statistical Methods , 2000, Technometrics.

[3]  Christian P. Robert,et al.  Monte Carlo Statistical Methods (Springer Texts in Statistics) , 2005 .

[4]  Feras Saad,et al.  Probabilistic Data Analysis with Probabilistic Programming , 2016, ArXiv.

[5]  Bernhard Schölkopf,et al.  Kernel Measures of Conditional Dependence , 2007, NIPS.

[6]  Bernhard Schölkopf,et al.  Measuring Statistical Dependence with Hilbert-Schmidt Norms , 2005, ALT.

[7]  Jessika Weiss,et al.  Graphical Models In Applied Multivariate Statistics , 2016 .

[8]  Daniel Fink A Compendium of Conjugate Priors , 1997 .

[9]  J. Ghosh,et al.  POSTERIOR CONSISTENCY OF DIRICHLET MIXTURES IN DENSITY ESTIMATION , 1999 .

[10]  Craig Boutilier,et al.  Context-Specific Independence in Bayesian Networks , 1996, UAI.

[11]  Liam Paninski,et al.  Estimation of Entropy and Mutual Information , 2003, Neural Computation.

[12]  Radford M. Neal,et al.  A Split-Merge Markov chain Monte Carlo Procedure for the Dirichlet Process Mixture Model , 2004 .

[13]  N. L. Johnson,et al.  Multivariate Analysis , 1958, Nature.

[14]  Arthur Gretton,et al.  A Kernel Test for Three-Variable Interactions , 2013, NIPS.

[15]  Michael I. Jordan,et al.  Kernel independent component analysis , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[16]  David B. Dunson,et al.  Nonparametric Bayes inference on conditional independence , 2014, 1404.1429.

[17]  H. White,et al.  A Consistent Characteristic-Function-Based Test for Conditional Independence , 2003 .

[18]  Ross D. Shachter Bayes-Ball: The Rational Pastime (for Determining Irrelevance and Requisite Information in Belief Networks and Influence Diagrams) , 1998, UAI.

[19]  Joshua B. Tenenbaum,et al.  CrossCat: A Fully Bayesian Nonparametric Method for Analyzing Heterogeneous, High Dimensional Data , 2015, J. Mach. Learn. Res..

[20]  Eric Jonas,et al.  Scaling Nonparametric Bayesian Inference via Subsample-Annealing , 2014, AISTATS.

[21]  A. Kraskov,et al.  Estimating mutual information. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[22]  Barnabás Póczos,et al.  Copula-based Kernel Dependency Measures , 2012, ICML.

[23]  Bernhard Schölkopf,et al.  Kernel-based Conditional Independence Test and Application in Causal Discovery , 2011, UAI.

[24]  Patrick Shafto,et al.  BayesDB: A probabilistic programming system for querying the probable implications of data , 2015, ArXiv.

[25]  J. N. R. Jeffers,et al.  Graphical Models in Applied Multivariate Statistics. , 1990 .

[26]  Graham J. Wills,et al.  Introduction to graphical modelling , 1995 .

[27]  J. Rombouts,et al.  Nonparametric Copula-Based Test for Conditional Independence with Applications to Granger Causality , 2012 .

[28]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[29]  Moon,et al.  Estimation of mutual information using kernel density estimators. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[30]  M. Escobar,et al.  Bayesian Density Estimation and Inference Using Mixtures , 1995 .

[31]  Bernhard Schölkopf,et al.  The Randomized Dependence Coefficient , 2013, NIPS.

[32]  Carl E. Rasmussen,et al.  Dirichlet Process Gaussian Mixture Models: Choice of the Base Distribution , 2010, Journal of Computer Science and Technology.