Scalable Score Computation for Learning Multinomial Bayesian Networks over Distributed Data

In this paper, we focus on the problem of learning a Bayesian network over distributed data stored in a commodity cluster. Specifically, we address the challenge of computing the scoring function over distributed data in a scalable manner, which is a fundamental task during learning. We propose a novel approach designed to achieve: (a) scalable score computation using the principle of gossiping; (b) lower resource consumption via a probabilistic approach for maintaining scores using the properties of a Markov chain; and (c) effective distribution of tasks during score computation (on large datasets) by synergistically combining well-known hashing techniques. Through theoretical analysis, we show that our approach is superior to a MapReduce-style computation in terms of communication bandwidth. Further, it is superior to the batchstyle processing of MapReduce for recomputing scores when new data are available.

[1]  Johannes Gehrke,et al.  Gossip-based computation of aggregate information , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[2]  Weiyi Liu,et al.  A MapReduce-Based Method for Learning Bayesian Network from Massive Data , 2013, APWeb.

[3]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[4]  Wei Chen,et al.  Massively parallel learning of Bayesian networks with MapReduce for factor relationship analysis , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[5]  Stephen P. Boyd,et al.  Gossip algorithms: design, analysis and applications , 2005, Proceedings IEEE 24th Annual Joint Conference of the IEEE Computer and Communications Societies..

[6]  Doug Fisher,et al.  Learning from Data: Artificial Intelligence and Statistics V , 1996 .

[7]  Pedro M. Domingos,et al.  Learning Bayesian network classifiers by maximizing conditional likelihood , 2004, ICML.

[8]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[9]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[10]  Joseph M. Hellerstein,et al.  Distributed GraphLab: A Framework for Machine Learning in the Cloud , 2012, Proc. VLDB Endow..

[11]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[12]  Ole J. Mengshoel,et al.  Accelerating Bayesian network parameter learning using Hadoop and MapReduce , 2012, BigMine '12.

[13]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[14]  Dan Klein,et al.  Evaluating strategies for similarity search on the web , 2002, WWW '02.

[15]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[16]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[17]  Alexander J. Smola,et al.  Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.

[18]  Katherine A. Heller,et al.  Bayesian hierarchical clustering , 2005, ICML.

[19]  Yunjun Gao,et al.  A Parallel Algorithm for Bayesian Network Parameter Learning Based on Factor Graph , 2013, 2013 IEEE 25th International Conference on Tools with Artificial Intelligence.

[20]  Devavrat Shah,et al.  Fast Distributed Algorithms for Computing Separable Functions , 2005, IEEE Transactions on Information Theory.