Parallelisation of the PC Algorithm

This paper describes a parallel version of the PC algorithm for learning the structure of a Bayesian network from data. The PC algorithm is a constraint-based algorithm consisting of five steps where the first step is to perform a set of conditional independence tests while the remaining four steps relate to identifying the structure of the Bayesian network using the results of the conditional independence tests. In this paper, we describe a new approach to parallelisation of the conditional independence testing as experiments illustrate that this is by far the most time consuming step. The proposed parallel PC algorithm is evaluated on data sets generated at random from five different real-world Bayesian networks. The results demonstrate that significant time performance improvements are possible using the proposed algorithm.

[1]  Finn Verner Jensen,et al.  MUNIN: an expert EMG assistant , 1988 .

[2]  Uffe Kjærulff,et al.  Bayesian Networks and Influence Diagrams: A Guide to Construction and Analysis , 2007, Information Science and Statistics.

[3]  Finn V. Jensen,et al.  Bayesian Networks and Decision Graphs , 2001, Statistics for Engineering and Information Science.

[4]  Douglas R. Stinson,et al.  Combinatorial designs: constructions and analysis , 2003, SIGA.

[5]  Claus Skaanning,et al.  The SACSO System for Troubleshooting of Printing Systems , 2001, SCAI.

[6]  P. Spirtes,et al.  Causation, prediction, and search , 1993 .

[7]  Ewart R. Carson,et al.  A Model-Based Approach to Insulin Adjustment , 1991, AIME.

[8]  W. R. Shao,et al.  Bayesian Networks and Influence Diagrams: A Guide to Construction and Analysis , 2008 .

[9]  Anders L. Madsen,et al.  A New Method for Vertical Parallelisation of TAN Learning Based on Balanced Incomplete Block Designs , 2014, Probabilistic Graphical Models.

[10]  Srinivas Aluru,et al.  Parallel Discovery of Direct Causal Relations and Markov Boundaries with Applications to Gene Networks , 2011, 2011 International Conference on Parallel Processing.

[11]  Ole J. Mengshoel,et al.  Accelerating Bayesian network parameter learning using Hadoop and MapReduce , 2012, BigMine '12.

[12]  Weiyi Liu,et al.  A MapReduce-Based Method for Learning Bayesian Network from Massive Data , 2013, APWeb.

[13]  Thomas Hofmann,et al.  Map-Reduce for Machine Learning on Multicore , 2007 .

[14]  Peter Bühlmann,et al.  Estimating High-Dimensional Directed Acyclic Graphs with the PC-Algorithm , 2007, J. Mach. Learn. Res..

[15]  Jirí Vomlel,et al.  The SACSO methodology for troubleshooting complex systems , 2001, Artificial Intelligence for Engineering Design, Analysis and Manufacturing.

[16]  Martijn de Jongh ALGORITHMS FOR CONSTRAINT-BASED LEARNING OF BAYESIAN NETWORK STRUCTURES WITH LARGE NUMBERS OF VARIABLES , 2014 .

[17]  Donald E. Knuth,et al.  The Art of Computer Programming, Volume 4, Fascicle 2: Generating All Tuples and Permutations (Art of Computer Programming) , 2005 .

[18]  Yue Wang,et al.  An Empirical Study of Massively Parallel Bayesian Networks Learning for Sentiment Extraction from Unstructured Text , 2011, APWeb.

[19]  Donald Ervin Knuth,et al.  The Art of Computer Programming , 1968 .

[20]  Marco Scutari,et al.  Learning Bayesian Networks with the bnlearn R Package , 2009, 0908.3817.