A Fast PC Algorithm with Reversed-order Pruning and A Parallelization Strategy

The PC algorithm is the state-of-the-art algorithm for causal structure discovery on observational data. It can be computationally expensive in the worst case due to the conditional independence tests are performed in an exhaustive-searching manner. This makes the algorithm computationally intractable when the task contains several hundred or thousand nodes, particularly when the true underlying causal graph is dense. We propose a critical observation that the conditional set rendering two nodes independent is non-unique, and including certain redundant nodes do not sacrifice result accuracy. Based on this finding, the innovations of our work are two-folds. First, we innovate on a reserve order linkage pruning PC algorithm which significantly increases the algorithm’s efficiency. Second, we propose a parallel computing strategy for statistical independence tests by leveraging tensor computation, which brings further speedup. We also prove the proposed algorithm does not induce statistical power loss under mild graph and data dimensionality assumptions. Experimental results show that the singlethreaded version of the proposed algorithm can achieve a 6-fold speedup compared to the PC algorithm on a dense 95-node graph, and the parallel version can make a 825-fold speed-up. We also provide proof that the proposed algorithm is consistent under the same set of conditions with conventional PC algorithm.

[1]  Jiuyong Li,et al.  A Fast PC Algorithm for High Dimensional Causal Discovery with Multi-Core PCs , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[2]  R. Fisher 035: The Distribution of the Partial Correlation Coefficient. , 1924 .

[3]  Bin Yu,et al.  Counting and exploring sizes of Markov equivalence classes of directed acyclic graphs , 2015, J. Mach. Learn. Res..

[4]  Diego Colombo,et al.  A modification of the PC algorithm yielding order-independent skeletons , 2012, ArXiv.

[5]  Matin Hashemi,et al.  cuPC: CUDA-Based Parallel PC Algorithm for Causal Structure Learning on GPU , 2018, IEEE Transactions on Parallel and Distributed Systems.

[6]  J. Pearl Causal inference in statistics: An overview , 2009 .

[7]  Olivier Goudet,et al.  Causal Discovery Toolbox: Uncover causal relationships in Python , 2019, 1903.02278.

[8]  Peter Bühlmann,et al.  Estimating High-Dimensional Directed Acyclic Graphs with the PC-Algorithm , 2007, J. Mach. Learn. Res..

[9]  David Maxwell Chickering,et al.  Optimal Structure Identification With Greedy Search , 2002, J. Mach. Learn. Res..

[10]  J. Woodward,et al.  Independence, Invariance and the Causal Markov Condition , 1999, The British Journal for the Philosophy of Science.

[11]  P. Spirtes,et al.  Review of Causal Discovery Methods Based on Graphical Models , 2019, Front. Genet..

[12]  H. Hotelling New Light on the Correlation Coefficient and its Transforms , 1953 .

[13]  H. Kyburg,et al.  How the laws of physics lie , 1984 .

[14]  Daphne Koller,et al.  Active Learning for Structure in Bayesian Networks , 2001, IJCAI.

[15]  M. Kendall Statistical Methods for Research Workers , 1937, Nature.

[16]  S. Goodman,et al.  Causal inference in public health. , 2013, Annual review of public health.

[17]  Marco Scutari,et al.  Learning Bayesian Networks with the bnlearn R Package , 2009, 0908.3817.

[18]  Tristan Zajonc,et al.  Essays on Causal Inference for Public Policy. , 2012 .

[19]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[20]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.

[21]  Gautam Shroff,et al.  Comparative Benchmarking of Causal Discovery Techniques , 2017, ArXiv.

[22]  Peter Bühlmann,et al.  Causal Inference Using Graphical Models with the R Package pcalg , 2012 .

[23]  Christopher Meek,et al.  Causal inference and causal explanation with background knowledge , 1995, UAI.

[24]  Judea Pearl,et al.  Equivalence and Synthesis of Causal Models , 1990, UAI.

[25]  Michael I. Jordan Graphical Models , 2003 .

[26]  Anders L. Madsen,et al.  A parallel algorithm for Bayesian network structure learning from large data sets , 2017, Knowl. Based Syst..

[27]  Hal R Varian,et al.  Causal inference in economics and marketing , 2016, Proceedings of the National Academy of Sciences.

[28]  R Scheines,et al.  The TETRAD Project: Constraint Based Aids to Causal Model Specification. , 1998, Multivariate behavioral research.

[29]  P. Spirtes,et al.  An Algorithm for Fast Recovery of Sparse Causal Graphs , 1991 .

[30]  Diego Colombo,et al.  Order-independent constraint-based causal structure learning , 2012, J. Mach. Learn. Res..

[31]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[32]  Simone Fattorini,et al.  Cause and Correlation in Biology. A User's Guide to Path Analysis, Structural Equations and Causal Inference with R, Second edition, Bill Shipley. Cambridge University Press (2016), (ISBN: 978-1-107-44259-7, 314 pp., £39.99, paperback) , 2017 .