Fast Non-Parametric Tests of Relative Dependency and Similarity

We introduce two novel non-parametric statistical hypothesis tests. The first test, called the relative test of dependency, enables us to determine whether one source variable is significantly more dependent on a first target variable or a second. Dependence is measured via the Hilbert-Schmidt Independence Criterion (HSIC). The second test, called the relative test of similarity, is use to determine which of the two samples from arbitrary distributions is significantly closer to a reference sample of interest and the relative measure of similarity is based on the Maximum Mean Discrepancy (MMD). To construct these tests, we have used as our test statistics the difference of HSIC statistics and of MMD statistics, respectively. The resulting tests are consistent and unbiased, and have favorable convergence properties. The effectiveness of the relative dependency test is demonstrated on several real-world problems: we identify languages groups from a multilingual parallel corpus, and we show that tumor location is more dependent on gene expression than chromosome imbalance. We also demonstrate the performance of the relative test of similarity over a broad selection of model comparisons problems in deep generative models.

[1]  W. Hoeffding A Class of Statistics with Asymptotically Normal Distribution , 1948 .

[2]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[3]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[4]  B. V. Bahr On the Convergence of Moments in the Central Limit Theorem , 1965 .

[5]  R. Darlington,et al.  Multiple regression in psychological research and practice. , 1968, Psychological bulletin.

[6]  C. Baker Joint measures and cross-covariance operators , 1973 .

[7]  J. Friedman,et al.  Multivariate generalizations of the Wald--Wolfowitz and Smirnov two-sample tests , 1979 .

[8]  Ing Rj Ser Approximation Theorems of Mathematical Statistics , 1980 .

[9]  E. Giné,et al.  Limit Theorems for $U$-Processes , 1993 .

[10]  J. Bring A Geometric Approach to Compare Variables in a Regression Model , 1996 .

[11]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[12]  J. Dauxois,et al.  Nonlinear canonical analysis and independence tests , 1998 .

[13]  Ingo Steinwart,et al.  On the Influence of the Kernel on the Consistency of Support Vector Machines , 2002, J. Mach. Learn. Res..

[14]  P. Hall,et al.  Permutation tests for equality of distributions in high‐dimensional settings , 2002 .

[15]  R. Gray,et al.  Language-tree divergence times support the Anatolian theory of Indo-European origin , 2003, Nature.

[16]  Michael I. Jordan,et al.  Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces , 2004, J. Mach. Learn. Res..

[17]  A. Berlinet,et al.  Reproducing kernel Hilbert spaces in probability and statistics , 2004 .

[18]  Steve R. Gunn,et al.  Structural Modelling with Sparse Kernels , 2002, Machine Learning.

[19]  Bernhard Schölkopf,et al.  Measuring Statistical Dependence with Hilbert-Schmidt Norms , 2005, ALT.

[20]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[21]  Bernhard Schölkopf,et al.  Kernel Methods for Measuring Independence , 2005, J. Mach. Learn. Res..

[22]  P. Rosenbaum An exact distribution‐free test comparing two multivariate distributions based on adjacency , 2005 .

[23]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[24]  Bernhard Schölkopf,et al.  A Kernel Method for the Two-Sample-Problem , 2006, NIPS.

[25]  Le Song,et al.  A Kernel Statistical Test of Independence , 2007, NIPS.

[26]  Bernhard Schölkopf,et al.  Kernel Measures of Conditional Dependence , 2007, NIPS.

[27]  Le Song,et al.  A Hilbert Space Embedding for Distributions , 2007, Discovery Science.

[28]  R. Gilbertson,et al.  Tumorigenesis in the brain: location, location, location. , 2007, Cancer research.

[29]  Maria L. Rizzo,et al.  Measuring and testing dependence by correlation of distances , 2007, 0803.4101.

[30]  F. Scaravilli,et al.  Expression profiling of ependymomas unravels localization and tumor grade‐specific tumorigenesis , 2009, Cancer.

[31]  Mehryar Mohri,et al.  Learning Non-Linear Combinations of Kernels , 2009, NIPS.

[32]  Geoffrey E. Hinton,et al.  Deep Boltzmann Machines , 2009, AISTATS.

[33]  Arthur Gretton,et al.  Consistent Nonparametric Tests of Independence , 2010, J. Mach. Learn. Res..

[34]  Bernhard Schölkopf,et al.  Kernel-based Conditional Independence Test and Application in Causal Discovery , 2011, UAI.

[35]  Hugo Larochelle,et al.  The Neural Autoregressive Distribution Estimator , 2011, AISTATS.

[36]  Michael Mitzenmacher,et al.  Detecting Novel Associations in Large Data Sets , 2011, Science.

[37]  P. Varlet,et al.  Mesenchymal Transition and PDGFRA Amplification/Mutation Are Key Distinct Oncogenic Events in Pediatric Diffuse Intrinsic Pontine Gliomas , 2012, PloS one.

[38]  Simon J. Greenhill,et al.  Mapping the Origins and Expansion of the Indo-European Language Family , 2012, Science.

[39]  Sivaraman Balakrishnan,et al.  Optimal kernel choice for large-scale two-sample tests , 2012, NIPS.

[40]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[41]  Le Song,et al.  Feature Selection via Dependence Maximization , 2012, J. Mach. Learn. Res..

[42]  Kenji Fukumizu,et al.  Equivalence of distance-based and RKHS-based statistics in hypothesis testing , 2012, ArXiv.

[43]  R. Heller,et al.  A consistent multivariate test of association based on ranks of distances , 2012, 1201.3522.

[44]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[45]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[46]  Yoshua Bengio,et al.  Deep Generative Stochastic Networks Trainable by Backprop , 2013, ICML.

[47]  L. Wasserman,et al.  On Estimating $L_2^2$ Divergence , 2014, AISTATS 2015.

[48]  J. Kinney,et al.  Equitability, mutual information, and the maximal information coefficient , 2013, Proceedings of the National Academy of Sciences.

[49]  Arthur Gretton,et al.  A low variance consistent test of relative dependency , 2015, ICML.

[50]  On Estimating L 22 Divergence , 2015 .

[51]  Richard S. Zemel,et al.  Generative Moment Matching Networks , 2015, ICML.

[52]  Arthur Gretton,et al.  A Test of Relative Similarity For Model Selection in Generative Models , 2015, ICLR.