The hardness of conditional independence testing and the generalised covariance measure

It is a common saying that testing for conditional independence, i.e., testing whether whether two random vectors $X$ and $Y$ are independent, given $Z$, is a hard statistical problem if $Z$ is a continuous random variable (or vector). In this paper, we prove that conditional independence is indeed a particularly difficult hypothesis to test for. Valid statistical tests are required to have a size that is smaller than a predefined significance level, and different tests usually have power against a different class of alternatives. We prove that a valid test for conditional independence does not have power against any alternative. Given the non-existence of a uniformly valid conditional independence test, we argue that tests must be designed so their suitability for a particular problem may be judged easily. To address this need, we propose in the case where $X$ and $Y$ are univariate to nonlinearly regress $X$ on $Z$, and $Y$ on $Z$ and then compute a test statistic based on the sample covariance between the residuals, which we call the generalised covariance measure (GCM). We prove that validity of this form of test relies almost entirely on the weak requirement that the regression procedures are able to estimate the conditional means $X$ given $Z$, and $Y$ given $Z$, at a slow rate. We extend the methodology to handle settings where $X$ and $Y$ may be multivariate or even high-dimensional. While our general procedure can be tailored to the setting at hand by combining it with any regression technique, we develop the theoretical guarantees for kernel ridge regression. A simulation study shows that the test based on GCM is competitive with state of the art conditional independence tests. Code is available as the R package GeneralisedCovarianceMeasure on CRAN.

[1]  Michael I. Jordan Graphical Models , 2003 .

[2]  Stephen E. Fienberg,et al.  Testing Statistical Hypotheses , 2005 .

[3]  Thomas B. Berrett,et al.  The conditional permutation test , 2018, 1807.05405.

[4]  Bernhard Schölkopf,et al.  Elements of Causal Inference: Foundations and Learning Algorithms , 2017 .

[5]  Thomas B. Berrett,et al.  Nonparametric independence testing via mutual information , 2017, Biometrika.

[6]  Whitney K. Newey,et al.  Cross-fitting and fast remainder rates for semiparametric estimation , 2017, 1801.09138.

[7]  James M. Robins,et al.  MINIMAX ESTIMATION OF A FUNCTIONAL ON A STRUCTURED , 2016 .

[8]  N. Meinshausen,et al.  Symmetric rank covariances: a generalized framework for nonparametric measures of dependence , 2017, Biometrika.

[9]  Sivaraman Balakrishnan,et al.  Hypothesis Testing For Densities and High-Dimensional Multinomials: Sharp Local Minimax Rates , 2017, The Annals of Statistics.

[10]  Christina Heinze-Deml,et al.  Invariant Causal Prediction for Nonlinear Models , 2017, Journal of Causal Inference.

[11]  J. Robins,et al.  Double/Debiased Machine Learning for Treatment and Structural Parameters , 2017 .

[12]  Marcelo J. Moreira,et al.  Impossible inference in econometrics: Theory and applications , 2016, Journal of Econometrics.

[13]  J. Robins,et al.  Double/Debiased Machine Learning for Treatment and Causal Parameters , 2016, 1608.00060.

[14]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[15]  Barbara Rakitsch,et al.  Modelling local gene networks increases power to detect trans-acting genetic effects on gene expression , 2016, Genome Biology.

[16]  Rajen Dinesh Shah,et al.  Goodness‐of‐fit tests for high dimensional linear models , 2015, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[17]  Maximilian Kasy,et al.  Uniformity and the Delta Method , 2015, Journal of Econometric Methods.

[18]  Francis R. Bach,et al.  On the Equivalence between Kernel Quadrature Rules and Random Feature Expansions , 2015, J. Mach. Learn. Res..

[19]  Yang Feng,et al.  A Projection-based Conditional Dependence Measure with Applications to High-dimensional Undirected Graphical Models. , 2015, Journal of econometrics.

[20]  Jonas Peters,et al.  Causal inference by using invariant prediction: identification and confidence intervals , 2015, 1501.01332.

[21]  Joseph Ramsey,et al.  A Scalable Conditional Independence Test for Nonlinear, Non-Gaussian Data , 2014, ArXiv.

[22]  Harrison H. Zhou,et al.  Asymptotic normality and optimalities in estimation of large Gaussian graphical models , 2013, 1309.6024.

[23]  Nik Weaver,et al.  Measure Theory and Functional Analysis , 2013 .

[24]  Gábor Lugosi,et al.  Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[25]  S. Boucheron,et al.  Concentration inequalities : a non asymptotic theory of independence , 2013 .

[26]  Kengo Kato,et al.  Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors , 2012, 1212.6906.

[27]  Azeem M. Shaikh,et al.  On the Testability of Identification in Some Nonparametric Models With Endogeneity , 2012 .

[28]  László Györfi,et al.  Strongly consistent nonparametric tests of conditional independence , 2012 .

[29]  V. Koltchinskii,et al.  Oracle inequalities in empirical risk minimization and sparse recovery problems , 2011 .

[30]  Bernhard Schölkopf,et al.  Kernel-based Conditional Independence Test and Application in Causal Discovery , 2011, UAI.

[31]  James M Robins,et al.  Higher Order Inference On A Treatment Effect Under Low Regularity Conditions. , 2011, Statistics & probability letters.

[32]  Sara van de Geer,et al.  Statistics for High-Dimensional Data: Methods, Theory and Applications , 2011 .

[33]  Sham M. Kakade,et al.  A risk comparison of ordinary least squares vs ridge regression , 2011, J. Mach. Learn. Res..

[34]  Nir Friedman,et al.  Probabilistic Graphical Models: Principles and Techniques - Adaptive Computation and Machine Learning , 2009 .

[35]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[36]  Aad van der Vaart,et al.  Higher order influence functions and minimax estimation of nonlinear functionals , 2008, 0805.3040.

[37]  Bernhard Schölkopf,et al.  Kernel Measures of Conditional Dependence , 2007, NIPS.

[38]  Le Song,et al.  A Kernel Statistical Test of Independence , 2007, NIPS.

[39]  Rainer Spang,et al.  Inferring cellular networks – a review , 2007, BMC Bioinformatics.

[40]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[41]  Joseph P. Romano On Non‐parametric Testing, the Uniform Behaviour of the t‐test, and Related Problems , 2004 .

[42]  M. West,et al.  Sparse graphical models for exploring gene expression data , 2004 .

[43]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.

[44]  Bernhard Schölkopf,et al.  A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[45]  P. Bickel Efficient and Adaptive Estimation for Semiparametric Models , 1993 .

[46]  A. Barron Uniformly Powerful Goodness of Fit Tests , 1989 .

[47]  A. Dawid Conditional Independence in Statistical Theory , 1979 .

[48]  G. Wahba,et al.  A Correspondence Between Bayesian Estimation on Stochastic Processes and Smoothing by Splines , 1970 .

[49]  L. J. Savage,et al.  The nonexistence of certain statistical procedures in nonparametric problems , 1956 .

[50]  J. I The Design of Experiments , 1936, Nature.

[51]  R. Fisher Two New Properties of Mathematical Likelihood , 1934 .

[52]  R. Fisher A mathematical Examination of the Methods of determining the Accuracy of Observation by the Mean Error, and by the Mean Square Error , 1920 .

[53]  Sara van de Geer,et al.  Statistics for High-Dimensional Data , 2011 .

[54]  J. E. García,et al.  A non-parametric test of independence ∗ , 2011 .

[55]  Mark J. van der Laan,et al.  Cross-Validated Targeted Minimum-Loss-Based Estimation , 2011 .

[56]  A. W. van der Vaart,et al.  Semiparametric Minimax Rates. , 2009, Electronic journal of statistics.

[57]  Wicher P. Bergsma,et al.  Testing conditional independence for continuous random variables , 2004 .

[58]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[59]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[60]  Thomas G. Dietterich Adaptive computation and machine learning , 1998 .

[61]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[62]  J. Daudin Partial association measures and an application to qualitative regression , 1980 .

[63]  L. Lecam Convergence of Estimates Under Dimensionality Restrictions , 1973 .

[64]  C. Kraft Some conditions for consistency and uniform consistency of statistical procedures , 1955 .

[65]  Jean-Marie Dufour,et al.  Série Scientifique Scientific Series Identification, Weak Instruments and Statistical Inference in Econometrics Identification, Weak Instruments and Statistical Inference in Econometrics , 2022 .