Using Pregel-like Large Scale Graph Processing Frameworks for Social Network Analysis

Pregel is a system for large scale graph processing developed at Google. It provides a scalable framework for running graph analytics on clusters of commodity machines. In this paper, we present several important undirected graph algorithms for social network analysis which fit within this framework. We discuss various graph componentisation methods, diameter estimation, degrees of separations, along with triangle, k-core and k-truss finding and computing clustering coefficients. Finally we present some experimental results using our own implementation of the Pregel framework, and examine key features of the general framework and algorithmic design.

[1]  Jonathan Cohen,et al.  Graph Twiddling in a MapReduce World , 2009, Computing in Science & Engineering.

[2]  Albert Chan,et al.  CGMGRAPH/CGMLIB: Implementing and Testing CGM Graph Algorithms on PC Clusters and Shared Memory Machines , 2005, Int. J. High Perform. Comput. Appl..

[3]  Henri E. Bal,et al.  A High-Level Framework for Distributed Processing of Large-Scale Graphs , 2011, ICDCN.

[4]  Nitesh V. Chawla,et al.  DisNet: A Framework for Distributed Graph Computation , 2011, 2011 International Conference on Advances in Social Networks Analysis and Mining.

[5]  Jure Leskovec,et al.  Planetary-scale views on a large instant-messaging network , 2008, WWW.

[6]  Christos Faloutsos,et al.  PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[7]  Sreenivas Gollapudi,et al.  Of hammers and nails: an empirical comparison of three paradigms for processing large graphs , 2012, WSDM '12.

[8]  Douglas P. Gregor,et al.  The Parallel BGL : A Generic Library for Distributed Graph Computations , 2005 .

[9]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing - "ABSTRACT" , 2009, PODC '09.

[10]  Jimmy J. Lin,et al.  Book Reviews: Data-Intensive Text Processing with MapReduce by Jimmy Lin and Chris Dyer , 2010, CL.

[11]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[12]  Marco Rosa,et al.  HyperANF: approximating the neighbourhood function of very large graphs on a budget , 2010, WWW.

[13]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[14]  Roberto Grossi,et al.  Finding the Diameter in Real-World Graphs - Experimentally Turning a Lower Bound into an Upper Bound , 2010, ESA.

[15]  Jimmy J. Lin,et al.  Design patterns for efficient graph algorithms in MapReduce , 2010, MLG '10.

[16]  Michael D. Ernst,et al.  HaLoop , 2010, Proc. VLDB Endow..

[17]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[18]  Jimmy J. Lin,et al.  Earlybird: Real-Time Search at Twitter , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[19]  Zheng Shao,et al.  Hive - a petabyte scale data warehouse using Hadoop , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[20]  Lars Backstrom,et al.  The Anatomy of the Facebook Social Graph , 2011, ArXiv.

[21]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[22]  Christos Faloutsos,et al.  PEGASUS: mining peta-scale graphs , 2011, Knowledge and Information Systems.

[23]  Sergei Vassilvitskii,et al.  Counting triangles and the curse of the last reducer , 2011, WWW.

[24]  Marco Rosa,et al.  Four degrees of separation , 2011, WebSci '12.