Evaluation and Analysis of Distributed Graph-Parallel Processing Frameworks

A number of graph-parallel processing frameworks have been proposed to address the needs of processing complex and large-scale graph structured datasetsinrecentyears.Althoughsignificantperformanceimprovementmade by those frameworks were reported, comparative advantages of each of these frameworks over the others have not been fully studied, which impedes the best utilization of those frameworks for a specific graph computing task and setting. In this work, we conducted a comparison study on parallel processing systems for large-scale graph computations in a systematic manner, aiming to reveal the characteristics of those systems in performing common graph algorithms with real-world datasets on the same ground. We selected three popular graph-parallel processing frameworks (Giraph, GPS and GraphLab) forthestudyandalsoincludearepresentativegeneraldata-parallelcomputing system—Spark—inthecomparisoninordertounderstandhowwellageneral data-parallel system can run graph problems. We applied basic performance

[1]  Jinyang Li,et al.  Building fast, distributed programs with partitioned tables , 2010 .

[2]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[3]  Jennifer Widom,et al.  GPS: a graph processing system , 2013, SSDBM.

[4]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[5]  L. Dagum,et al.  OpenMP: an industry standard API for shared-memory programming , 1998 .

[6]  Mark Newman,et al.  Networks: An Introduction , 2010 .

[7]  Carl D. Meyer,et al.  Deeper Inside PageRank , 2004, Internet Math..

[8]  Message Passing Interface Forum MPI: A message - passing interface standard , 1994 .

[9]  Yue Zhao,et al.  LightGraph: Lighten Communication in Distributed Graph-Parallel Processing , 2014, 2014 IEEE International Congress on Big Data.

[10]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[11]  Scott Shenker,et al.  Fast and Interactive Analytics over Hadoop Data with Spark , 2012, login Usenix Mag..

[12]  Alberto Montresor,et al.  An evaluation study of BigData frameworks for graph processing , 2013, 2013 IEEE International Conference on Big Data.

[13]  Ion Stoica,et al.  The GraphX Graph Processing System , 2013 .

[14]  Yong Guo Towards Benchmarking Graph-Processing Platforms , 2013 .

[15]  Matthew G. Knepley,et al.  Numerical simulation of geodynamic processes with the Portable Extensible Toolkit for Scientific Computation , 2007 .

[16]  Douglas P. Gregor,et al.  The Parallel BGL : A Generic Library for Distributed Graph Computations , 2005 .

[17]  John Scott,et al.  The SAGE Handbook of Social Network Analysis , 2011 .

[18]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[19]  Joseph M. Hellerstein,et al.  GraphLab: A New Framework For Parallel Machine Learning , 2010, UAI.

[20]  Edward Y. Chang,et al.  Parallel Spectral Clustering in Distributed Systems , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Jinyang Li,et al.  Piccolo: Building Fast, Distributed Programs with Partitioned Tables , 2010, OSDI.

[22]  Ananth Grama,et al.  Efficient Large-Scale Graph Analysis in MapReduce , 2012 .

[23]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[24]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[25]  Carlos Guestrin,et al.  Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .

[26]  Alexandru Iosup,et al.  How Well Do Graph-Processing Platforms Perform? An Empirical Performance Evaluation and Analysis , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[27]  Sepandar D. Kamvar,et al.  An Analytical Comparison of Approaches to Personalizing PageRank , 2003 .

[28]  Minas Gjoka,et al.  Walking in Facebook: A Case Study of Unbiased Sampling of OSNs , 2010, 2010 Proceedings IEEE INFOCOM.