Giraphx: Parallel Yet Serializable Large-Scale Graph Processing

Bulk Synchronous Parallelism (BSP) provides a good model for parallel processing of many large-scale graph applications, however it is unsuitable/inefficient for graph applications that require coordination, such as graph-coloring, subcoloring, and clustering. To address this problem, we present an efficient modification to the BSP model to implement serializability (sequential consistency) without reducing the highly-parallel nature of BSP. Our modification bypasses the message queues in BSP and reads directly from the worker's memory for the internal vertex executions. To ensure serializability, coordination is performed--implemented via dining philosophers or token ring-- only for border vertices partitioned across workers. We implement our modifications to BSP on Giraph, an open-source clone of Google's Pregel. We show through a graph-coloring application that our modified framework, Giraphx, provides much better performance than implementing the application using dining-philosophers over Giraph. In fact, Giraphx outperforms Giraph even for embarrassingly parallel applications that do not require coordination, e.g., PageRank.

[1]  Rishan Chen,et al.  Improving large graph processing on partitioned graphs in the cloud , 2012, SoCC '12.

[2]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[3]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[4]  Guy E. Blelloch,et al.  GraphChi: Large-Scale Graph Computation on Just a PC , 2012, OSDI.

[5]  Scott A. Braun,et al.  A Cloud-Resolving Simulation of Hurricane Bob, 1991: Storm Structure and Eyewall Buoyancy , 2013 .

[6]  Carlos Guestrin,et al.  Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .

[7]  K. Mani Chandy,et al.  The drinking philosophers problem , 1984, ACM Trans. Program. Lang. Syst..

[8]  Daniel J. Abadi,et al.  Scalable SPARQL querying of large RDF graphs , 2011, Proc. VLDB Endow..

[9]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[10]  Johannes Gehrke,et al.  Asynchronous Large-Scale Graph Processing Made Easy , 2013, CIDR.

[11]  Andreas Reuter,et al.  Transaction Processing: Concepts and Techniques , 1992 .

[12]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[13]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[14]  Gabriel Kliot,et al.  Streaming graph partitioning for large distributed graphs , 2012, KDD.