Fast and Scalable Distributed Loopy Belief Propagation on Real-World Graphs

Given graphs with millions or billions of vertices and edges, how can we efficiently make inferences based on partial knowledge? Loopy Belief Propagation(LBP) is a graph inference algorithm widely used in various applications including social network analysis, malware detection, recommendation, and image restoration. The algorithm calculates approximate marginal probabilities of vertices in a graph within a linear running time proportional to the number of edges. However, when it comes to real-world graphs with millions or billions of vertices and edges, this cost overwhelms the computing power of a single machine. Moreover, this kind of large-scale graphs does not fit into the memory of a single machine. Although several distributed LBP methods have been proposed, previous works do not consider the properties of real-world graphs, especially the effect of power-law degree distribution on LBP. Therefore, our work focuses on developing a fast and scalable LBP for such large real-world graphs on distributed environment. In this paper, we propose DLBP, a Distributed Loopy Belief Propagation algorithm which efficiently computes LBP in a distributed manner across multiple machines. By setting the correct convergence criterion and carefully scheduling the computations, DLBP provides up to 10.7x speed up compared to standard distributed LBP. We show that DLBP demonstrates near-linear scalability with respect to the number of machines as well as the number of edges.

[1]  Joseph Gonzalez,et al.  Residual Splash for Optimally Parallelizing Belief Propagation , 2009, AISTATS.

[2]  Jure Leskovec,et al.  Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters , 2008, Internet Math..

[3]  Christos Faloutsos,et al.  Netprobe: a fast and scalable system for fraud detection in online auction networks , 2007, WWW '07.

[4]  Michael I. Jordan,et al.  Loopy Belief Propagation for Approximate Inference: An Empirical Study , 1999, UAI.

[5]  Rong Zheng,et al.  Influence Spread in Large-Scale Social Networks - A Belief Propagation Approach , 2012, ECML/PKDD.

[6]  Lixin Gao,et al.  Scalable Distributed Belief Propagation with Prioritized Block Updates , 2014, CIKM.

[7]  U Kang,et al.  PegasusN: A Scalable and Versatile Graph Mining System , 2018, AAAI.

[8]  David R. O'Hallaron,et al.  Distributed Parallel Inference on Large Factor Graphs , 2009, UAI.

[9]  Kyomin Jung,et al.  IRIE: Scalable and Robust Influence Maximization in Social Networks , 2011, 2012 IEEE 12th International Conference on Data Mining.

[10]  Christos Faloutsos,et al.  SlashBurn: Graph Compression and Mining beyond Caveman Communities , 2014, IEEE Transactions on Knowledge and Data Engineering.

[11]  Christos Faloutsos,et al.  Polonium: Tera-Scale Graph Mining and Inference for Malware Detection , 2011 .

[12]  Endika Bengoetxea,et al.  A parallel framework for loopy belief propagation , 2007, GECCO '07.

[13]  Duen Horng Chau,et al.  Guilt by association: large scale malware detection by mining file-relation graphs , 2014, KDD.

[14]  Christos Faloutsos,et al.  Top-N recommendation through belief propagation , 2012, CIKM.

[15]  Christos Faloutsos,et al.  Detecting Fraudulent Personalities in Networks of Online Auctioneers , 2006, PKDD.

[16]  Lada A. Adamic,et al.  The political blogosphere and the 2004 U.S. election: divided they blog , 2005, LinkKDD '05.

[17]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[18]  Henry A. Kautz,et al.  Finding your friends and following them to where you are , 2012, WSDM '12.

[19]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[20]  Danai Koutra,et al.  Unifying Guilt-by-Association Approaches: Theorems and Fast Algorithms , 2011, ECML/PKDD.

[21]  Minji Yoon,et al.  PMV: Pre-partitioned Generalized Matrix-Vector Multiplication for Scalable Graph Mining , 2017, ArXiv.

[22]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[23]  Reynold Xin,et al.  GraphX: Graph Processing in a Distributed Dataflow Framework , 2014, OSDI.

[24]  John Langford,et al.  Scaling up machine learning: parallel and distributed approaches , 2011, KDD '11 Tutorials.

[25]  Christos Faloutsos,et al.  Mining large graphs: Algorithms, inference, and discoveries , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[26]  Christos Faloutsos,et al.  Inference of Beliefs on Billion-Scale Graphs , 2010 .

[27]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[28]  Christos Faloutsos,et al.  PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[29]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[30]  Jun Zou,et al.  A belief propagation approach for detecting shilling attacks in collaborative filtering , 2013, CIKM.

[31]  Bin Shao,et al.  Fast graph mining with HBase , 2015, Inf. Sci..

[32]  Christos Faloutsos,et al.  PIN-TRUST: Fast Trust Propagation Exploiting Positive, Implicit, and Negative Information , 2016, CIKM.

[33]  Binyu Zang,et al.  PowerLyra: Differentiated Graph Computation and Partitioning on Skewed Graphs , 2019, TOPC.

[34]  Daniel P. Huttenlocher,et al.  Efficient Belief Propagation for Early Vision , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[35]  Bora Uçar,et al.  On Two-Dimensional Sparse Matrix Partitioning: Models, Methods, and a Recipe , 2010, SIAM J. Sci. Comput..

[36]  Danai Koutra,et al.  Linearized and Single-Pass Belief Propagation , 2014, Proc. VLDB Endow..

[37]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[38]  U Kang,et al.  Supervised Belief Propagation: Scalable Supervised Inference on Attributed Networks , 2017, 2017 IEEE International Conference on Data Mining (ICDM).