Bayesian Kernelised Test of (In)dependence with Mixed-type Variables

A fundamental task in AI is to assess (in)dependence between mixed-type variables (text, image, sound). We propose a Bayesian kernelised correlation test of (in)dependence using a Dirichlet process model. The new measure of (in)dependence allows us to answer some fundamental questions: Based on data, are (mixed-type) variables independent? How likely is dependence/independence to hold? How high is the probability that two mixedtype variables are more than just weakly dependent? We theoretically show the properties of the approach, as well as algorithms for fast computation with it. We empirically demonstrate the effectiveness of the proposed method by analysing its performance and by comparing it with other frequentist and Bayesian approaches on a range of datasets and tasks with mixedtype variables.

[1]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[2]  Daniel Lapidus Gapminder: Unveiling the beauty of statistics for a fact based world view Flash animations bringing enormous amounts of data to life , 2008 .

[3]  Le Song,et al.  A Hilbert Space Embedding for Distributions , 2007, Discovery Science.

[4]  Maria L. Rizzo,et al.  Measuring and testing dependence by correlation of distances , 2007, 0803.4101.

[5]  Bernhard Schölkopf,et al.  Kernel Measures of Conditional Dependence , 2007, NIPS.

[6]  Randal S. Olson,et al.  PMLB: a large benchmark suite for machine learning evaluation and comparison , 2017, BioData Mining.

[7]  D. Blei Bayesian Nonparametrics I , 2016 .

[8]  Eman Ahmed,et al.  House Price Estimation from Visual and Textual Features , 2016, IJCCI.

[9]  Zaïd Harchaoui,et al.  Testing for Homogeneity with Kernel Fisher Discriminant Analysis , 2007, NIPS.

[10]  Feras Saad,et al.  Detecting Dependencies in Sparse, Multivariate Databases Using Probabilistic Programming and Non-parametric Bayes , 2016, AISTATS.

[11]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[12]  Bernhard Schölkopf,et al.  A Permutation-Based Kernel Conditional Independence Test , 2014, UAI.

[13]  Kenji Fukumizu,et al.  Equivalence of distance-based and RKHS-based statistics in hypothesis testing , 2012, ArXiv.

[14]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[15]  Arthur Gretton,et al.  Large-scale kernel methods for independence testing , 2016, Statistics and Computing.

[16]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[17]  Marco Zaffalon,et al.  A Bayesian nonparametric procedure for comparing algorithms , 2015, ICML.

[18]  Le Song,et al.  A Kernel Statistical Test of Independence , 2007, NIPS.

[19]  Michael Mitzenmacher,et al.  Detecting Novel Associations in Large Data Sets , 2011, Science.

[20]  A. Berlinet,et al.  Reproducing kernel Hilbert spaces in probability and statistics , 2004 .

[21]  Maria L. Rizzo,et al.  Brownian distance covariance , 2009, 1010.0297.

[22]  Gábor J. Székely,et al.  The distance correlation t-test of independence in high dimension , 2013, J. Multivar. Anal..