Learning Causal Networks via Additive Faithfulness

In this paper we introduce a statistical model, called additively faithful directed acyclic graph (AFDAG), for causal learning from observational data. Our approach is based on additive conditional independence (ACI), a recently proposed three-way statistical relation that shares many similarities with conditional independence but without resorting to multidimensional kernels. This distinct feature strikes a balance between a parametric model and a fully nonparametric model, which makes the proposed model attractive for handling large networks. We develop an estimator for AFDAG based on a linear operator that characterizes ACI, and establish the consistency and convergence rates of this estimator, as well as the uniform consistency of the estimated DAG. Moreover, we introduce a modified PC-algorithm to implement the estimating procedure efficiently, so that its complexity is determined by the level of sparseness rather than the dimension of the network. Through simulation studies we show that our method outperforms existing methods when commonly assumed conditions such as Gaussian or Gaussian copula distributions do not hold. Finally, the usefulness of AFDAG formulation is demonstrated through an application to a proteomics data set.

[1]  Bernhard Schölkopf,et al.  Kernel Mean Estimation and Stein Effect , 2013, ICML.

[2]  Larry A. Wasserman,et al.  The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs , 2009, J. Mach. Learn. Res..

[3]  W. Wong,et al.  Learning Causal Bayesian Network Structures From Experimental Data , 2008 .

[4]  Yangbo He,et al.  Active Learning of Causal Networks with Intervention Experiments and Optimal Designs , 2008 .

[5]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.

[6]  Hongyu Zhao,et al.  BAYESIAN HIERARCHICAL MODELING FOR SIGNALING PATHWAY INFERENCE FROM SINGLE CELL INTERVENTIONAL DATA. , 2011, The annals of applied statistics.

[7]  Francis R. Bach,et al.  Consistency of the group Lasso and multiple kernel learning , 2007, J. Mach. Learn. Res..

[8]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[9]  E. Ghysels COWLES FOUNDATION FOR RESEARCH IN ECONOMICS AT YALE UNIVERSITY Box , 1988 .

[10]  Bing Li,et al.  Variable selection via additive conditional independence , 2016 .

[11]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[12]  Aapo Hyvärinen,et al.  On the Identifiability of the Post-Nonlinear Causal Model , 2009, UAI.

[13]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[14]  Bernhard Schölkopf,et al.  Causal discovery with continuous additive noise models , 2013, J. Mach. Learn. Res..

[15]  H. White,et al.  A Consistent Characteristic-Function-Based Test for Conditional Independence , 2003 .

[16]  Bernhard Schölkopf,et al.  Kernel-based Conditional Independence Test and Application in Causal Discovery , 2011, UAI.

[17]  Kenji Fukumizu,et al.  Statistical Consistency of Kernel Canonical Correlation Analysis , 2007 .

[18]  Arthur Gretton,et al.  Nonlinear directed acyclic structure learning with weakly additive noise models , 2009, NIPS.

[19]  Tzee-Ming Huang Testing conditional independence using maximal nonlinear conditional correlation , 2010, 1010.3843.

[20]  Michael I. Jordan,et al.  Kernel dimension reduction in regression , 2009, 0908.1854.

[21]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[22]  Bing Li,et al.  A general theory for nonlinear sufficient dimension reduction: Formulation and estimation , 2013, 1304.0580.

[23]  David Maxwell Chickering,et al.  Optimal Structure Identification With Greedy Search , 2003, J. Mach. Learn. Res..

[24]  Jiji Zhang,et al.  Strong Faithfulness and Uniform Consistency in Causal Inference , 2002, UAI.

[25]  Kenji Fukumizu,et al.  Equivalence of distance-based and RKHS-based statistics in hypothesis testing , 2012, ArXiv.

[26]  C. Baker Joint measures and cross-covariance operators , 1973 .

[27]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[28]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[29]  Peter Buhlmann,et al.  Geometry of the faithfulness assumption in causal inference , 2012, 1207.0547.

[30]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[31]  Friedrich Sauvigny,et al.  Linear Operators in Hilbert Spaces , 2012 .

[32]  Aapo Hyvärinen,et al.  A Linear Non-Gaussian Acyclic Model for Causal Discovery , 2006, J. Mach. Learn. Res..

[33]  K. Sachs,et al.  Causal Protein-Signaling Networks Derived from Multiparameter Single-Cell Data , 2005, Science.

[34]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[35]  Constantin F. Aliferis,et al.  The max-min hill-climbing Bayesian network structure learning algorithm , 2006, Machine Learning.

[36]  Judea Pearl,et al.  The Logic of Representing Dependencies by Directed Graphs , 1987, AAAI.

[37]  H. White,et al.  A NONPARAMETRIC HELLINGER METRIC TEST FOR CONDITIONAL INDEPENDENCE , 2008, Econometric Theory.

[38]  Dan Geiger,et al.  Conditional independence and its representations , 1990, Kybernetika.

[39]  H. Zou,et al.  Regularized rank-based estimation of high-dimensional nonparanormal graphical models , 2012, 1302.3082.

[40]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[41]  Bernhard Schölkopf,et al.  Kernel Mean Shrinkage Estimators , 2014, J. Mach. Learn. Res..

[42]  Dimitris Margaritis,et al.  Distribution-Free Learning of Bayesian Network Structure in Continuous Domains , 2005, AAAI.

[43]  P. Bickel Efficient and Adaptive Estimation for Semiparametric Models , 1993 .

[44]  Michael I. Jordan Graphical Models , 1998 .

[45]  Bernhard Schölkopf,et al.  A kernel-based causal learning algorithm , 2007, ICML '07.

[46]  Bernhard Schölkopf,et al.  Regression by dependence minimization and its application to causal inference in additive noise models , 2009, ICML '09.

[47]  Kyungchul Song Testing Conditional Independence via Rosenblatt Transforms , 2007, 0911.3787.

[48]  Peter Bühlmann,et al.  Estimating High-Dimensional Directed Acyclic Graphs with the PC-Algorithm , 2007, J. Mach. Learn. Res..

[49]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[50]  Hongyu Zhao,et al.  Sparse Estimation of Conditional Graphical Models With Application to Gene Networks , 2012, Journal of the American Statistical Association.

[51]  Larry A. Wasserman,et al.  High Dimensional Semiparametric Gaussian Copula Graphical Models. , 2012, ICML 2012.

[52]  Bernhard Schölkopf,et al.  Nonlinear causal discovery with additive noise models , 2008, NIPS.

[53]  Judea Pearl,et al.  Equivalence and Synthesis of Causal Models , 1990, UAI.

[54]  Hyonho Chun,et al.  On an Additive Semigraphoid Model for Statistical Networks With Application to Pathway Analysis , 2014, Journal of the American Statistical Association.

[55]  Bernhard Schölkopf,et al.  Kernel Measures of Conditional Dependence , 2007, NIPS.

[56]  Michael I. Jordan,et al.  Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces , 2004, J. Mach. Learn. Res..

[57]  Naftali Harris,et al.  PC algorithm for nonparanormal graphical models , 2013, J. Mach. Learn. Res..