Testing Independence Between Linear Combinations for Causal Discovery

Recently, regression based conditional independence (CI) tests have been employed to solve the problem of causal discovery. These methods provide an alternative way to test for CI by transforming CI to independence between residuals. Generally, it is nontrivial to check for independence when these residuals are linearly uncorrelated. With the ability to represent high-order moments, kernel-based methods are usually used to achieve this goal, but at a cost of considerable time. In this paper, we investigate the independence between two linear combinations under linear non-Gaussian structural equation model (SEM). We show that generally the 1-st to 4-th moments of the two linear combinations contain enough information to infer whether or not they are independent. The proposed method provides a simpler but more effective way to measure CIs, with only calculating the 1-st to 4-th moments of the input variables. When applied to causal discovery, the proposed method outperforms kernel-based methods in terms of both speed and accuracy. which is validated by extensive experiments.

[1]  Mélanie Frappier,et al.  The Book of Why: The New Science of Cause and Effect , 2018, Science.

[2]  Eric V. Strobl,et al.  Approximate Kernel-Based Conditional Independence Tests for Fast Non-Parametric Causal Discovery , 2017, Journal of Causal Inference.

[3]  Shuigeng Zhou,et al.  Measuring Conditional Independence by Independent Residuals: Theoretical Results and Application in Causal Discovery , 2018, AAAI.

[4]  Y. Lau,et al.  Homozygous transcription factor 3 gene (TCF3) mutation is associated with severe hypogammaglobulinemia and B-cell acute lymphoblastic leukemia. , 2017, The Journal of allergy and clinical immunology.

[5]  Shuigeng Zhou,et al.  Causal Discovery Using Regression-Based Conditional Independence Tests , 2017, AAAI.

[6]  Vasant Honavar,et al.  Self-Discrepancy Conditional Independence Test , 2017, UAI.

[7]  Daniel M. Kane,et al.  A New Approach for Testing Properties of Discrete Distributions , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[8]  Bernhard Schölkopf,et al.  Identification of causal relations in neuroimaging data with latent confounders: An instrumental variable approach , 2016, NeuroImage.

[9]  Bernhard Schölkopf,et al.  On Estimation of Functional Causal Models , 2015, ACM Trans. Intell. Syst. Technol..

[10]  Alexander J. Smola,et al.  Gaussian Processes for Independence Tests with Non-iid Data in Causal Inference , 2015, ACM Trans. Intell. Syst. Technol..

[11]  Bernhard Schölkopf,et al.  A Permutation-Based Kernel Conditional Independence Test , 2014, UAI.

[12]  Joseph Ramsey,et al.  A Scalable Conditional Independence Test for Nonlinear, Non-Gaussian Data , 2014, ArXiv.

[13]  Peter J. F. Lucas,et al.  Exploiting causal functional relationships in Bayesian network modelling for personalised healthcare , 2014, Int. J. Approx. Reason..

[14]  Ruichu Cai,et al.  Causal gene identification using combinatorial V-structure search , 2013, Neural Networks.

[15]  Ruichu Cai,et al.  SADA: A General Framework to Support Robust Causation Discovery , 2013, ICML.

[16]  Bernhard Schölkopf,et al.  Identifiability of Causal Graphs using Functional Models , 2011, UAI.

[17]  Bernhard Schölkopf,et al.  Kernel-based Conditional Independence Test and Application in Causal Discovery , 2011, UAI.

[18]  Aapo Hyvärinen,et al.  DirectLiNGAM: A Direct Method for Learning a Linear Non-Gaussian Structural Equation Model , 2011, J. Mach. Learn. Res..

[19]  Aapo Hyvärinen,et al.  On the Identifiability of the Post-Nonlinear Causal Model , 2009, UAI.

[20]  H. White,et al.  A NONPARAMETRIC HELLINGER METRIC TEST FOR CONDITIONAL INDEPENDENCE , 2008, Econometric Theory.

[21]  Bernhard Schölkopf,et al.  Kernel Measures of Conditional Dependence , 2007, NIPS.

[22]  Yanqing Zhang,et al.  Development of Two-Stage SVM-RFE Gene Selection Strategy for Microarray Expression Data Analysis , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[23]  S. Higashiyama,et al.  Effects of promyelocytic leukemia zinc finger protein on the proliferation of cultured human corneal endothelial cells , 2007, Molecular vision.

[24]  P. Bühlmann,et al.  Estimating High-Dimensional Directed Acyclic Graphs with the PC-Algorithm , 2005, J. Mach. Learn. Res..

[25]  Bernhard Schölkopf,et al.  A Kernel Method for the Two-Sample-Problem , 2006, NIPS.

[26]  Stephen T. C. Wong,et al.  Cancer classification and prediction using logistic regression with Bayesian gene selection , 2004, J. Biomed. Informatics.

[27]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.

[28]  T. Gilmore,et al.  Zyxin and paxillin proteins: focal adhesion plaque LIM domain proteins go nuclear. , 2003, Biochimica et biophysica acta.

[29]  I. Bernstein,et al.  Efficacy and safety of gemtuzumab ozogamicin in patients with CD33-positive acute myeloid leukemia in first relapse. , 2001, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[30]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[31]  A. A. Thompson,et al.  Aberrations of the B-cell receptor B29 (CD79b) gene in chronic lymphocytic leukemia. , 1997, Blood.

[32]  Y. Miura,et al.  Sensitivity and applicability of different methods for detection of terminal transferase in leukemia. , 1996, Leukemia.

[33]  J. Daudin Partial association measures and an application to qualitative regression , 1980 .

[34]  B. Turowska,et al.  Distribution of the ABO and Rh D blood groups in patients with leukemia. , 1970, Polish medical science and history bulletin.

[35]  E. Lukács,et al.  A Property of the Normal Distribution , 1954 .

[36]  G. Darmois,et al.  Analyse générale des liaisons stochastiques: etude particulière de l'analyse factorielle linéaire , 1953 .

[37]  Illtyd Trethowan Causality , 1938 .