DRONE: a Distributed gRaph cOmputiNg Engine

Nowadays, in big data era, social networks, graph database, knowledge graph, electronic commerce and etc. demand efficient and scalable capability to process ever increasingly volume of graph-structured data. To meet the challenge, two mainstream distributed programming models, vertex-centric VC and subgraph-centric (SC) were proposed. Compared to the VC model, the SC model converges faster with less communication overhead on well-partitioned graphs, and is easy to program with due to the "think like a graph" philosophy. However, edge-cut method causes significant performance bottleneck for preprocessing large graphs, especially power-law graphs. Although the edge-cut method is considered as a natural choice of subgraph-centric model for graph partitioning, and adopted by Giraph++, Blogel, GRAPE. Thus, the SC model is less competitive in practice. In this paper, we present an innovative distributed graph computing framework, DRONE(Distributed gRaph cOmputiNg Engine). It combines the subgraph-centric model and the vertex-cut graph partitioning strategy. Experiments show that DRONE outperform the state-of-art distributed graph computing engines on real-world graphs and synthetic power-law graphs. DRONE is capable to scale up to process one-trillion-edges synthetic power-law graphs, which is orders of magnitude larger than previously reported by existing SC-based frameworks.

[1]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[2]  Alexander J. Smola,et al.  Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.

[3]  Alberto Montresor,et al.  Distributed Edge Partitioning for Graph Processing , 2014, ArXiv.

[4]  Robert E. Tarjan,et al.  Fibonacci heaps and their uses in improved network optimization algorithms , 1984, JACM.

[5]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[6]  Yogesh L. Simmhan,et al.  GoFFish: A Sub-graph Centric Framework for Large-Scale Graph Analytics , 2013, Euro-Par.

[7]  Christos Faloutsos,et al.  Kronecker Graphs: An Approach to Modeling Networks , 2008, J. Mach. Learn. Res..

[8]  Wenguang Chen,et al.  ShenTu: Processing Multi-Trillion Edge Graphs on Millions of Cores in Seconds , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.

[9]  Carlos Guestrin,et al.  Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .

[10]  Vipin Kumar,et al.  A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm , 1997, PP.

[11]  Wilfred Ng,et al.  Blogel: A Block-Centric Framework for Distributed Computation on Real-World Graphs , 2014, Proc. VLDB Endow..

[12]  Hlib Mykhailenko Distributed edge partitioning , 2017 .

[13]  Thomas A. Henzinger,et al.  Computing simulations on finite and infinite graphs , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[14]  Yinghui Wu,et al.  Parallelizing Sequential Graph Computations , 2017, SIGMOD Conference.

[15]  Marc Lelarge,et al.  Balanced graph edge partition , 2014, KDD.

[16]  Lixin Gao,et al.  Accelerate large-scale iterative computation through asynchronous accumulative updates , 2012, ScienceCloud '12.

[17]  Zhihua Zhang,et al.  Distributed Power-law Graph Computing: Theoretical and Empirical Analysis , 2014, NIPS.

[18]  Michalis Faloutsos,et al.  On power-law relationships of the Internet topology , 1999, SIGCOMM '99.

[19]  Shirish Tatikonda,et al.  From "Think Like a Vertex" to "Think Like a Graph" , 2013, Proc. VLDB Endow..

[20]  Gene H. Golub,et al.  Exploiting the Block Structure of the Web for Computing , 2003 .