A parallel Bayesian network learning algorithm for classification

Bayesian network (BN), an important machine learning technique, has been widely used in modeling relationships among random variables. BN is considered to be suitable for tasks like prediction, classification and cause analysis. In fact, Bayesian network model often preforms better precision than other commonly used algorithm models in classification and prediction. Meanwhile, taking Max-Min-Hill-Climbing as an example, many BN structure learning algorithms are heuristic, which means the time algorithm needs to converge can grow intensively when dealing with massive calculation. This paper aims at lessening time cost of learning BN structure process. We proposed an approach combining MapReduce with MMHC method. After splitting the training data set, several sub Bayesian network structures are learned simultaneously on Hadoop. To easily integrate prediction results from all those subnets, we employed boosting method to manage classification task. Our experiment results show good precision as well as better time performance in real distributed environment.