A Fast PC Algorithm for High Dimensional Causal Discovery with Multi-Core PCs

Discovering causal relationships from observational data is a crucial problem and it has applications in many research areas. The PC algorithm is the state-of-the-art constraint based method for causal discovery. However, runtime of the PC algorithm, in the worst-case, is exponential to the number of nodes (variables), and thus it is inefficient when being applied to high dimensional data, e.g., gene expression datasets. On another note, the advancement of computer hardware in the last decade has resulted in the widespread availability of multi-core personal computers. There is a significant motivation for designing a parallelized PC algorithm that is suitable for personal computers and does not require end users’ parallel computing knowledge beyond their competency in using the PC algorithm. In this paper, we develop parallel-PC, a fast and memory efficient PC algorithm using the parallel computing technique. We apply our method to a range of synthetic and real-world high dimensional datasets. Experimental results on a dataset from the DREAM 5 challenge show that the original PC algorithm could not produce any results after running more than 24 hours; meanwhile, our parallel-PC algorithm managed to finish within around 12 hours with a 4-core CPU computer, and less than six hours with a 8-core CPU computer. Furthermore, we integrate parallel-PC into a causal inference method for inferring miRNA-mRNA regulatory relationships. The experimental results show that parallel-PC helps improve both the efficiency and accuracy of the causal inference algorithm.

[1]  C. Sims Money, Income, and Causality , 1972 .

[2]  F. Harary New directions in the theory of graphs , 1973 .

[3]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[4]  Judea Pearl,et al.  Equivalence and Synthesis of Causal Models , 1990, UAI.

[5]  Thomas S. Richardson,et al.  A Discovery Algorithm for Directed Cyclic Graphs , 1996, UAI.

[6]  Eric Horvitz,et al.  Proceedings of the Twelfth international conference on Uncertainty in artificial intelligence , 1996 .

[7]  Marek J. Druzdzel,et al.  A Hybrid Anytime Algorithm for the Construction of Causal Models From Sparse Data , 1999, UAI.

[8]  Sebastian Thrun,et al.  Bayesian Network Induction via Local Neighborhoods , 1999, NIPS.

[9]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[10]  C. Granger Investigating causal relations by econometric models and cross-spectral methods , 1969 .

[11]  Graham J. Wills,et al.  Introduction to graphical modelling , 1995 .

[12]  Constantin F. Aliferis,et al.  Algorithms for Large Scale Markov Blanket Discovery , 2003, FLAIRS.

[13]  Anton J. Enright,et al.  MicroRNA targets in Drosophila , 2003, Genome Biology.

[14]  Constantin F. Aliferis,et al.  Time and sample efficient discovery of Markov blankets and direct causal relations , 2003, KDD '03.

[15]  D. Bartel MicroRNAs Genomics, Biogenesis, Mechanism, and Function , 2004, Cell.

[16]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[17]  Rajeev Motwani,et al.  Scalable Techniques for Mining Causal Structures , 1998, Data Mining and Knowledge Discovery.

[18]  Gregory F. Cooper,et al.  A Bayesian method for the induction of probabilistic networks from data , 1992, Machine Learning.

[19]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[20]  Gregory F. Cooper,et al.  A Simple Constraint-Based Algorithm for Efficiently Mining Observational Databases for Causal Relationships , 1997, Data Mining and Knowledge Discovery.

[21]  K. Gunsalus,et al.  Combinatorial microRNA target predictions , 2005, Nature Genetics.

[22]  C. Burge,et al.  Conserved Seed Pairing, Often Flanked by Adenosines, Indicates that Thousands of Human Genes are MicroRNA Targets , 2005, Cell.

[23]  Dimitris Margaritis,et al.  Speculative Markov blanket discovery for optimal feature selection , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[24]  Joaquín Abellán,et al.  Some Variations on the PC Algorithm , 2006, Probabilistic Graphical Models.

[25]  Jin-Wu Nam,et al.  Genomics of microRNA. , 2006, Trends in genetics : TIG.

[26]  Jiji Zhang,et al.  Adjacency-Faithfulness and Conservative Causal Inference , 2006, UAI.

[27]  N. Rajewsky microRNA target predictions in animals , 2006, Nature Genetics.

[28]  Byoung-Tak Zhang,et al.  BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btm045 Data and text mining Discovery of microRNA–mRNA modules via population-based probabilistic learning , 2007 .

[29]  Peter Bühlmann,et al.  Estimating High-Dimensional Directed Acyclic Graphs with the PC-Algorithm , 2007, J. Mach. Learn. Res..

[30]  Richard E. Neapolitan,et al.  Learning Bayesian networks , 2007, KDD '07.

[31]  Jesper Tegnér,et al.  Towards scalable and data efficient learning of Markov boundaries , 2007, Int. J. Approx. Reason..

[32]  B. Frey,et al.  Using expression profiling data to identify human microRNA targets , 2007, Nature Methods.

[33]  Tu Bao Ho,et al.  Finding microRNA regulatory modules in human genome using rule induction , 2008, BMC Bioinformatics.

[34]  A. Cano,et al.  A Score Based Ranking of the Edges for the PC Algorithm , 2008 .

[35]  Shunkai Fu,et al.  Fast Markov Blanket Discovery Algorithm Via Local Learning within Single Pass , 2008, Canadian Conference on AI.

[36]  Huiqing Liu,et al.  Identifying mRNA targets of microRNA dysregulated in cancer: with application to clear cell Renal Cell Carcinoma , 2010, BMC Systems Biology.

[37]  Tongbin Li,et al.  miRecords: an integrated resource for microRNA–target interactions , 2008, Nucleic Acids Res..

[38]  Marco Scutari,et al.  Learning Bayesian Networks with the bnlearn R Package , 2009, 0908.3817.

[39]  M. Maathuis,et al.  Estimating high-dimensional intervention effects from observational data , 2008, 0810.4214.

[40]  Peter Dalgaard,et al.  R Development Core Team (2010): R: A language and environment for statistical computing , 2010 .

[41]  Constantin F. Aliferis,et al.  Local Causal and Markov Blanket Induction for Causal Discovery and Feature Selection for Classification Part I: Algorithms and Empirical Evaluation , 2010, J. Mach. Learn. Res..

[42]  I. Van der Auwera,et al.  Integrated miRNA and mRNA expression profiling of the inflammatory breast cancer subtype , 2010, British Journal of Cancer.

[43]  Peter Bühlmann,et al.  Predicting causal effects in large-scale systems from observational data , 2010, Nature Methods.

[44]  Minghua Deng,et al.  A Lasso regression model for the construction of microRNA-target regulatory networks , 2011, Bioinform..

[45]  Norbert Gretz,et al.  miRWalk - Database: Prediction of possible miRNA binding sites by "walking" the genes of three genomes , 2011, J. Biomed. Informatics.

[46]  Diego Colombo,et al.  A modification of the PC algorithm yielding order-independent skeletons , 2012, ArXiv.

[47]  A. Luttun,et al.  Quantification of miRNA-mRNA Interactions , 2012, PloS one.

[48]  Xing-Ming Zhao,et al.  Inferring gene regulatory networks from gene expression data by path consistency algorithm based on conditional mutual information , 2012, Bioinform..

[49]  Nectarios Koziris,et al.  TarBase 6.0: capturing the exponential growth of miRNA targets with experimental support , 2011, Nucleic Acids Res..

[50]  Thomas S. Richardson,et al.  Learning high-dimensional directed acyclic graphs with latent and selection variables , 2011, 1104.5617.

[51]  Jiuyong Li,et al.  Discovery of Causal Rules Using Partial Association , 2012, 2012 IEEE 12th International Conference on Data Mining.

[52]  Peter Bühlmann,et al.  Causal Inference Using Graphical Models with the R Package pcalg , 2012 .

[53]  Jiuyong Li,et al.  Inferring microRNA and transcription factor regulatory networks in heterogeneous data , 2013, BMC Bioinformatics.

[54]  Diogo M. Camacho,et al.  Wisdom of crowds for robust gene network inference , 2012, Nature Methods.

[55]  Jiuyong Li,et al.  Mining Causal Association Rules , 2013, 2013 IEEE 13th International Conference on Data Mining Workshops.

[56]  Jiuyong Li,et al.  Inferring microRNA-mRNA causal regulatory relationships from expression data , 2013, Bioinform..

[57]  Marco Scutari,et al.  Bayesian Network Constraint-Based Structure Learning Algorithms: Parallel and Optimised Implementations in the bnlearn R Package , 2014, ArXiv.

[58]  Junpeng Zhang,et al.  Inferring condition-specific miRNA activity from matched miRNA and mRNA expression data , 2014, Bioinform..

[59]  Junpeng Zhang,et al.  Identifying direct miRNA-mRNA causal regulatory relationships in heterogeneous data , 2014, J. Biomed. Informatics.

[60]  Hsien-Da Huang,et al.  miRTarBase update 2014: an information resource for experimentally validated miRNA-target interactions , 2013, Nucleic Acids Res..

[61]  Srinivas Aluru,et al.  A Parallel Algorithm for Exact Bayesian Structure Discovery in Bayesian Networks , 2014, ArXiv.

[62]  Jiuyong Li,et al.  Practical Approaches to Causal Relationship Exploration , 2015, SpringerBriefs in Electrical and Computer Engineering.

[63]  Junpeng Zhang,et al.  From miRNA regulation to miRNA-TF co-regulation: computational approaches and challenges , 2015, Briefings Bioinform..

[64]  Jiuyong Li,et al.  Ensemble Methods for MiRNA Target Prediction from Expression Data , 2015, PloS one.

[65]  Jiuyong Li,et al.  From Observational Studies to Causal Rule Mining , 2015, ACM Trans. Intell. Syst. Technol..