DIET: Conditional independence testing with marginal dependence measures of residual information

Conditional randomization tests (CRTs) assess whether a variable $x$ is predictive of another variable $y$, having observed covariates $z$. CRTs require fitting a large number of predictive models, which is often computationally intractable. Existing solutions to reduce the cost of CRTs typically split the dataset into a train and test portion, or rely on heuristics for interactions, both of which lead to a loss in power. We propose the decoupled independence test (DIET), an algorithm that avoids both of these issues by leveraging marginal independence statistics to test conditional independence relationships. DIET tests the marginal independence of two random variables: $F(x \mid z)$ and $F(y \mid z)$ where $F(\cdot \mid z)$ is a conditional cumulative distribution function (CDF). These variables are termed"information residuals."We give sufficient conditions for DIET to achieve finite sample type-1 error control and power greater than the type-1 error rate. We then prove that when using the mutual information between the information residuals as a test statistic, DIET yields the most powerful conditionally valid test. Finally, we show DIET achieves higher power than other tractable CRTs on several synthetic and real benchmarks.

[1]  Michael I. Jordan Graphical Models , 2003 .

[2]  Aahlad Manas Puli,et al.  CONTRA: Contrarian statistics for controlled variable selection , 2021, AISTATS.

[3]  Jie Peng,et al.  Mean platelet volume/platelet count ratio predicts severe pneumonia of COVID‐19 , 2020, Journal of clinical laboratory analysis.

[4]  Jonathan S. Austrian,et al.  A validated, real-time prediction model for favorable outcomes in hospitalized COVID-19 patients , 2020, npj Digital Medicine.

[5]  Rajesh Ranganath,et al.  Deep Direct Likelihood Knockoffs , 2020, NeurIPS.

[6]  Lucas Janson,et al.  Fast and powerful conditional randomization testing via distillation. , 2020, Biometrika.

[7]  S. Weinberg,et al.  Risk stratification of hospitalized COVID-19 patients through comparative studies of laboratory results with influenza , 2020, EClinicalMedicine.

[8]  Leora I. Horwitz,et al.  Factors associated with hospital admission and critical illness among 5279 people with coronavirus disease 2019 in New York City: prospective cohort study , 2020, BMJ.

[9]  Aaditya Ramdas,et al.  A theoretical treatment of conditional independence testing under Model-X , 2020 .

[10]  T. McCoy,et al.  Laboratory findings associated with severe illness and mortality among hospitalized individuals with coronavirus disease 2019 in Eastern Massachusetts , 2020, medRxiv : the preprint server for health sciences.

[11]  Iain B McInnes,et al.  Obesity a Risk Factor for Severe COVID-19 Infection: Multiple Potential Mechanisms. , 2020, Circulation.

[12]  Q. Fan,et al.  D‐dimer levels on admission to predict in‐hospital mortality in patients with Covid‐19 , 2020, Journal of Thrombosis and Haemostasis.

[13]  Qiurong Ruan,et al.  Clinical predictors of mortality due to COVID-19 based on an analysis of data of 150 patients from Wuhan, China , 2020, Intensive Care Medicine.

[14]  J. Xiang,et al.  Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study , 2020, The Lancet.

[15]  Chiara Sabatti,et al.  Causal inference in genetic trio studies , 2020, Proceedings of the National Academy of Sciences.

[16]  Mihaela van der Schaar,et al.  Conditional Independence Testing using Generative Adversarial Networks , 2019, NeurIPS.

[17]  Karsten M. Borgwardt,et al.  Faculty Opinions recommendation of Panning for gold: ‘model‐X’ knockoffs for high dimensional controlled variable selection. , 2019, Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature.

[18]  E. Candès,et al.  Deep Knockoffs , 2018, Journal of the American Statistical Association.

[19]  David M. Blei,et al.  The Holdout Randomization Test for Feature Selection in Black Box Models , 2018, J. Comput. Graph. Stat..

[20]  Mihaela van der Schaar,et al.  KnockoffGAN: Generating Knockoffs for Feature Selection using Generative Adversarial Networks , 2018, ICLR.

[21]  M Sesia,et al.  Gene hunting with hidden Markov model knockoffs , 2017, Biometrika.

[22]  F. Liang,et al.  Bayesian Neural Networks for Selection of Drug Sensitive Genes , 2018, Journal of the American Statistical Association.

[23]  Alexandros G. Dimakis,et al.  Model-Powered Conditional Independence Test , 2017, NIPS.

[24]  Jakob Runge,et al.  Conditional independence testing based on a nearest-neighbor estimator of conditional mutual information , 2017, AISTATS.

[25]  Robert M. Maier,et al.  Causal associations between risk factors and common diseases inferred from GWAS summary data , 2017, Nature Communications.

[26]  Lucas Janson,et al.  Panning for gold: ‘model‐X’ knockoffs for high dimensional controlled variable selection , 2016, 1610.02351.

[27]  Dylan S. Small,et al.  Control Function Instrumental Variable Estimation of Nonlinear Causal Effect Models , 2016, J. Mach. Learn. Res..

[28]  Diederik P. Kingma,et al.  Variational Dropout and the Local Reparameterization Trick , 2015, NIPS.

[29]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[30]  B. Sen,et al.  On a nonparametric notion of residual and its applications , 2014, 1409.3886.

[31]  Bernhard Schölkopf,et al.  A Permutation-Based Kernel Conditional Independence Test , 2014, UAI.

[32]  Sridhar Ramaswamy,et al.  Genomics of Drug Sensitivity in Cancer (GDSC): a resource for therapeutic biomarker discovery in cancer cells , 2012, Nucleic Acids Res..

[33]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[34]  Bernhard Schölkopf,et al.  Kernel-based Conditional Independence Test and Application in Causal Discovery , 2011, UAI.

[35]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[36]  J. Pearl Causal inference in statistics: An overview , 2009 .

[37]  James Bailey,et al.  Information theoretic measures for clusterings comparison: is a correction for chance necessary? , 2009, ICML '09.

[38]  Jeffrey S. Racine,et al.  Nonparametric Estimation of Conditional CDF and Quantile Functions With Mixed Categorical and Continuous Data , 2008 .

[39]  Bernhard Schölkopf,et al.  Kernel Measures of Conditional Dependence , 2007, NIPS.

[40]  Pravin K. Trivedi,et al.  Copula Modeling: An Introduction for Practitioners , 2007 .

[41]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.

[42]  G. Imbens,et al.  Identification and Estimation of Triangular Simultaneous Equations Models without Additivity , 2002 .

[43]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[44]  Luis M. de Campos,et al.  A new approach for learning belief networks using independence criteria , 2000, Int. J. Approx. Reason..

[45]  Aahlad Manas Puli,et al.  General Control Functions for Causal Effect Estimation from IVs , 2020, NeurIPS.

[46]  Vasant Honavar,et al.  Towards Conditional Independence Test for Relational Data , 2017, UAI.

[47]  K. . KERNEL AND NEAREST NEIGHBOR ESTIMATION OF A CONDITIONAL QUANTILE by , 2008 .

[48]  Jie Cheng,et al.  Learning Bayesian Networks from Data: An Efficient Approach Based on Information Theory , 1999 .

[49]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[50]  S. Srihari Mixture Density Networks , 1994 .

[51]  J. Daudin Partial association measures and an application to qualitative regression , 1980 .