LCC-Graph: A high-performance graph-processing framework with low communication costs

With the rapid growth of data, communication overhead has become an important concern in applications of data centers and cloud computing. However, existing distributed graph-processing frameworks routinely suffer from high communication costs, leading to very long waiting times experienced by users for the graph-computing results. In order to address this problem, we propose a new computation model with low communication costs, called LCC-BSP. We use this model to design and implement a high-performance distributed graph-processing framework called LCC-Graph. This framework eliminates the high communication costs in existing distributed graph-processing frameworks. Moreover, LCC-Graph also minimizes the computation workload of each vertex, significantly reducing the computation time for each superstep. Evaluation of LCC-Graph on a 32-node cluster, driven by real-world graph datasets, shows that it significantly outperforms existing distributed graph-processing frameworks in terms of runtime, particularly when the system is supported by a high-bandwidth network. For example, LCC-Graph achieves an order of magnitude performance improvement over GPS and GraphLab.

[1]  Johannes Gehrke,et al.  Asynchronous Large-Scale Graph Processing Made Easy , 2013, CIDR.

[2]  Yogesh L. Simmhan,et al.  GoFFish: A Sub-graph Centric Framework for Large-Scale Graph Analytics , 2013, Euro-Par.

[3]  Guy E. Blelloch,et al.  GraphChi: Large-Scale Graph Computation on Just a PC , 2012, OSDI.

[4]  Gabriel Kliot,et al.  Streaming graph partitioning for large distributed graphs , 2012, KDD.

[5]  Bo Zong,et al.  Towards effective partition management for large graphs , 2012, SIGMOD Conference.

[6]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[7]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[8]  Reynold Xin,et al.  GraphX: Unifying Data-Parallel and Graph-Parallel Analytics , 2014, ArXiv.

[9]  Carlos Guestrin,et al.  Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .

[10]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[11]  Jennifer Widom,et al.  GPS: a graph processing system , 2013, SSDBM.

[12]  Khuzaima Daudjee,et al.  Giraph Unchained: Barrierless Asynchronous Parallel Execution in Pregel-like Graph Processing Systems , 2015, Proc. VLDB Endow..

[13]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[14]  Jon M. Kleinberg,et al.  Group formation in large social networks: membership, growth, and evolution , 2006, KDD '06.

[15]  Guy E. Blelloch,et al.  Ligra: a lightweight graph processing framework for shared memory , 2013, PPoPP '13.

[16]  Luigi di Stefano,et al.  A simple and efficient connected components labeling algorithm , 1999, Proceedings 10th International Conference on Image Analysis and Processing.

[17]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[18]  Ning Xu,et al.  LogGP: A Log-based Dynamic Graph Partitioning Method , 2014, Proc. VLDB Endow..

[19]  Willy Zwaenepoel,et al.  X-Stream: edge-centric graph processing using streaming partitions , 2013, SOSP.

[20]  Zoubin Ghahramani,et al.  Learning from labeled and unlabeled data with label propagation , 2002 .