Conditional Distance Correlation

Statistical inference on conditional dependence is essential in many fields including genetic association studies and graphical models. The classic measures focus on linear conditional correlations and are incapable of characterizing nonlinear conditional relationship including nonmonotonic relationship. To overcome this limitation, we introduce a nonparametric measure of conditional dependence for multivariate random variables with arbitrary dimensions. Our measure possesses the necessary and intuitive properties as a correlation index. Briefly, it is zero almost surely if and only if two multivariate random variables are conditionally independent given a third random variable. More importantly, the sample version of this measure can be expressed elegantly as the root of a V or U-process with random kernels and has desirable theoretical properties. Based on the sample version, we propose a test for conditional independence, which is proven to be more powerful than some recently developed tests through our numerical simulations. The advantage of our test is even greater when the relationship between the multivariate random variables given the third random variable cannot be expressed in a linear or monotonic function of one random variable versus the other. We also show that the sample measure is consistent and weakly convergent, and the test statistic is asymptotically normal. By applying our test in a real data analysis, we are able to identify two conditionally associated gene expressions, which otherwise cannot be revealed. Thus, our measure of conditional dependence is not only an ideal concept, but also has important practical utility. Supplementary materials for this article are available online.

[1]  H. White,et al.  Testing Conditional Independence Via Empirical Likelihood , 2014 .

[2]  Xavier Zanlonghi,et al.  Mutations in IMPG1 cause vitelliform macular dystrophies. , 2013, American journal of human genetics.

[3]  Kenji Fukumizu,et al.  Equivalence of distance-based and RKHS-based statistics in hypothesis testing , 2012, ArXiv.

[4]  J. Peyman,et al.  Regulation of gene Expression , 2012 .

[5]  Xing-Ming Zhao,et al.  Inferring gene regulatory networks from gene expression data by path consistency algorithm based on conditional mutual information , 2012, Bioinform..

[6]  T. Speed A Correlation for the 21st Century , 2011, Science.

[7]  Bernhard Schölkopf,et al.  Kernel-based Conditional Independence Test and Application in Causal Discovery , 2011, UAI.

[8]  Tzee-Ming Huang Testing conditional independence using maximal nonlinear conditional correlation , 2010, 1010.3843.

[9]  Marco Scutari,et al.  Learning Bayesian Networks with the bnlearn R Package , 2009, 0908.3817.

[10]  H. White,et al.  A NONPARAMETRIC HELLINGER METRIC TEST FOR CONDITIONAL INDEPENDENCE , 2008, Econometric Theory.

[11]  Bernhard Schölkopf,et al.  Kernel Measures of Conditional Dependence , 2007, NIPS.

[12]  Maria L. Rizzo,et al.  Measuring and testing dependence by correlation of distances , 2007, 0803.4101.

[13]  V. Sheffield,et al.  Regulation of gene expression in the mammalian eye and its relevance to eye disease , 2006, Proceedings of the National Academy of Sciences.

[14]  Qi Li,et al.  Nonparametric Econometrics: Theory and Practice , 2006 .

[15]  R. T. Smith,et al.  A common haplotype in the complement regulatory gene factor H (HF1/CFH) predisposes individuals to age-related macular degeneration. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[16]  J. Gilbert,et al.  Complement Factor H Variant Increases the Risk of Age-Related Macular Degeneration , 2005, Science.

[17]  H. White,et al.  A Consistent Characteristic-Function-Based Test for Conditional Independence , 2003 .

[18]  Ker-Chau Li,et al.  Genome-wide coexpression dynamics: Theory and application , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[19]  A. Tokunaga,et al.  Involvement of DNA Topoisomerase IIβ in Neuronal Differentiation* , 2001, The Journal of Biological Chemistry.

[20]  Efstathios Paparoditis,et al.  The Local Bootstrap for Kernel Estimators under General Dependence Conditions , 2000 .

[21]  Anton Schick On U-statistics with random kernels , 1997 .

[22]  O. Linton,et al.  Conditional Independence Restrictions: Testing and Estimation , 1996 .

[23]  Yanqin Fan,et al.  Consistent model specification tests : Omitted variables and semiparametric functional forms , 1996 .

[24]  G. Perdew,et al.  Regulation of Gene Expression , 2008, Goodman's Medical Cell Biology.

[25]  M. Wand,et al.  Multivariate plug-in bandwidth selection , 1994 .

[26]  Y. Inoue,et al.  Molecular cloning of partial cDNAs for rat DNA topoisomerase II isoforms and their differential expression in brain development. , 1993, The Journal of biological chemistry.

[27]  Alan J. Lee,et al.  U-Statistics: Theory and Practice , 1990 .

[28]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[29]  A. .. Lawrance On Conditional and Partial Correlation , 1976 .