Parallelisation of the PC Algorithm

This paper describes a parallel version of the PC algorithm for learning the structure of a Bayesian network from data. The PC algorithm is a constraint-based algorithm consisting of five steps where the first step is to perform a set of (conditional) independence tests while the remaining four steps relate to identifying the structure of the Bayesian network using the results of the (conditional) independence tests. In this paper, we describe a new approach to parallelization of the (conditional) independence testing as experiments illustrate that this is by far the most time consuming step. The proposed parallel PC algorithm is evaluated on data sets generated at random from five different realworld Bayesian networks. The results demonstrate that significant time performance improvements are possible using the proposed algorithm.

[1]  Ole J. Mengshoel,et al.  Accelerating Bayesian network parameter learning using Hadoop and MapReduce , 2012, BigMine '12.

[2]  Srinivas Aluru,et al.  Parallel Discovery of Direct Causal Relations and Markov Boundaries with Applications to Gene Networks , 2011, 2011 International Conference on Parallel Processing.

[3]  Yue Wang,et al.  An Empirical Study of Massively Parallel Bayesian Networks Learning for Sentiment Extraction from Unstructured Text , 2011, APWeb.

[4]  Kunle Olukotun,et al.  Map-Reduce for Machine Learning on Multicore , 2006, NIPS.

[5]  Finn Verner Jensen,et al.  MUNIN: an expert EMG assistant , 1988 .

[6]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.

[7]  Douglas R. Stinson,et al.  Combinatorial designs: constructions and analysis , 2003, SIGA.

[8]  Martijn de Jongh ALGORITHMS FOR CONSTRAINT-BASED LEARNING OF BAYESIAN NETWORK STRUCTURES WITH LARGE NUMBERS OF VARIABLES , 2014 .

[9]  Claus Skaanning,et al.  The SACSO System for Troubleshooting of Printing Systems , 2001, SCAI.

[10]  Peter Bühlmann,et al.  Estimating High-Dimensional Directed Acyclic Graphs with the PC-Algorithm , 2007, J. Mach. Learn. Res..

[11]  Marco Scutari,et al.  Learning Bayesian Networks with the bnlearn R Package , 2009, 0908.3817.

[12]  Uffe Kjærulff,et al.  Bayesian Networks and Influence Diagrams: A Guide to Construction and Analysis , 2007, Information Science and Statistics.

[13]  Anders L. Madsen,et al.  A New Method for Vertical Parallelisation of TAN Learning Based on Balanced Incomplete Block Designs , 2014, Probabilistic Graphical Models.

[14]  Finn V. Jensen,et al.  Bayesian Networks and Decision Graphs , 2001, Statistics for Engineering and Information Science.

[15]  Weiyi Liu,et al.  A MapReduce-Based Method for Learning Bayesian Network from Massive Data , 2013, APWeb.

[16]  Ewart R. Carson,et al.  A Model-Based Approach to Insulin Adjustment , 1991, AIME.