Nonparametric causal structure learning in high dimensions

The PC and FCI algorithms are popular constraint-based methods for learning the structure of directed acyclic graphs (DAGs) in the absence and presence of latent and selection variables, respectively. These algorithms (and their order-independent variants, PCstable and FCI-stable) have been shown to be consistent for learning sparse high-dimensional DAGs based on partial correlations. However, inferring conditional independences from partial correlations is valid if the data are jointly Gaussian or generated from a linear structural equation model — an assumption that may be violated in many applications. To broaden the scope of high-dimensional causal structure learning, we propose nonparametric variants of the PC-stable and FCI-stable algorithms that employ the conditional distance covariance (CdCov) to test for conditional independence relationships. As the key theoretical contribution, we prove that the high-dimensional consistency of the PC-stable and FCI-stable algorithms carry over to general distributions over DAGs when we implement CdCov-based nonparametric tests for conditional independence. Numerical studies demonstrate that our proposed algorithms perform nearly as good as the PC-stable and FCI-stable for Gaussian distributions, and offer advantages in non-Gaussian graphical models.

[1]  Ing Rj Ser Approximation Theorems of Mathematical Statistics , 1980 .

[2]  Constantin F. Aliferis,et al.  The max-min hill-climbing Bayesian network structure learning algorithm , 2006, Machine Learning.

[3]  Heping Zhang,et al.  Conditional Distance Correlation , 2015, Journal of the American Statistical Association.

[4]  Martin Wainwright,et al.  Handbook of Graphical Models , 2018 .

[5]  N. Meinshausen,et al.  Stability selection , 2008, 0809.2932.

[6]  Michael I. Jordan Graphical Models , 2003 .

[7]  Xianyang Zhang,et al.  Distance Metrics for Measuring Joint Dependence with Application to Causal Inference , 2017, Journal of the American Statistical Association.

[8]  Runze Li,et al.  Feature Selection for Varying Coefficient Models With Ultrahigh-Dimensional Covariates , 2014, Journal of the American Statistical Association.

[9]  Rajen Dinesh Shah,et al.  Variable selection with error control: another look at stability selection , 2011, 1105.5578.

[10]  Bernhard Schölkopf,et al.  A kernel-based causal learning algorithm , 2007, ICML '07.

[11]  Maria L. Rizzo,et al.  Measuring and testing dependence by correlation of distances , 2007, 0803.4101.

[12]  P. Spirtes,et al.  MARKOV EQUIVALENCE FOR ANCESTRAL GRAPHS , 2009, 0908.3605.

[13]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.

[14]  Illtyd Trethowan Causality , 1938 .

[15]  Vincent Y. F. Tan,et al.  High-dimensional Gaussian graphical model selection: walk summability and local separation criterion , 2011, J. Mach. Learn. Res..

[16]  S. Resnick A Probability Path , 1999 .

[17]  Bernhard Schölkopf,et al.  Kernel-based Conditional Independence Test and Application in Causal Discovery , 2011, UAI.

[18]  Bharath K. Sriperumbudur,et al.  On Distance and Kernel Measures of Conditional Independence , 2019, 1912.01103.

[19]  Naftali Harris,et al.  PC algorithm for nonparanormal graphical models , 2013, J. Mach. Learn. Res..

[20]  Thomas S. Richardson,et al.  Learning high-dimensional directed acyclic graphs with latent and selection variables , 2011, 1104.5617.

[21]  P. Spirtes,et al.  Ancestral graph Markov models , 2002 .

[22]  Ali Shojaie,et al.  Graph Estimation with Joint Additive Models. , 2013, Biometrika.

[23]  Peter Buhlmann,et al.  Geometry of the faithfulness assumption in causal inference , 2012, 1207.0547.

[24]  Peter Spirtes,et al.  An Anytime Algorithm for Causal Inference , 2001, AISTATS.

[25]  Arjun Sondhi,et al.  The Reduced PC-Algorithm: Improved Causal Structure Learning in Large Random Networks , 2018, J. Mach. Learn. Res..

[26]  Judea Pearl,et al.  Equivalence and Synthesis of Causal Models , 1990, UAI.

[27]  Peter Bühlmann,et al.  Estimating High-Dimensional Directed Acyclic Graphs with the PC-Algorithm , 2007, J. Mach. Learn. Res..

[28]  Jiji Zhang,et al.  On the completeness of orientation rules for causal discovery in the presence of latent confounders and selection bias , 2008, Artif. Intell..

[29]  Maria L. Rizzo,et al.  Partial Distance Correlation with Methods for Dissimilarities , 2013, 1310.2926.

[30]  Diego Colombo,et al.  Order-independent constraint-based causal structure learning , 2012, J. Mach. Learn. Res..

[31]  Xueqin Wang,et al.  Sure Independence Screening Adjusted for Confounding Covariates with Ultrahigh-dimensional Data , 2017 .

[32]  Po-Ling Loh,et al.  High-dimensional learning of linear causal networks via inverse covariance estimation , 2013, J. Mach. Learn. Res..