DD-Graph: A Highly Cost-Effective Distributed Disk-based Graph-Processing Framework

Existing distributed graph-processing frameworks, e.g.,GPS, Pregel and Giraph, handle large-scale graphs in the memory of clusters built of commodity compute nodes for better scalability and performance. While capable of scaling out according to the size of graphs up to thousands of compute nodes, for graphs beyond a certain size, these frameworks usually require the investments of machines that are either beyond the financial capability of or unprofitable for most small and medium-sized organizations. At the other end of the spectrum of graph-processing frameworks research, the single-node disk-based graph-processing frameworks, e.g., GraphChi, handle large-scale graphs on one commodity computer, leading to high efficiency in the use of hardware but at the cost of low user performance and limited scalability. Motivated by this dichotomy, in this paper we propose a distributed disk-based graph-processing framework, called DD-Graph, that can process super-large graphs on a small cluster while achieving the high performance of existing distributed in-memory graph-processing frameworks.

[1]  Guy E. Blelloch,et al.  GraphChi: Large-Scale Graph Computation on Just a PC , 2012, OSDI.

[2]  Jonathan W. Berry,et al.  Challenges in Parallel Graph Processing , 2007, Parallel Process. Lett..

[3]  Willy Zwaenepoel,et al.  X-Stream: edge-centric graph processing using streaming partitions , 2013, SOSP.

[4]  Nancy M. Amato,et al.  Faster Parallel Traversal of Scale Free Graphs at Extreme Scale with Vertex Delegates , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[5]  Nancy M. Amato,et al.  Multithreaded Asynchronous Graph Traversal for In-Memory and Semi-External Memory , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[6]  Willy Zwaenepoel,et al.  Chaos: scale-out graph processing from secondary storage , 2015, SOSP.

[7]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[8]  Jennifer Widom,et al.  GPS: a graph processing system , 2013, SSDBM.

[9]  References , 1971 .

[10]  Willy Zwaenepoel,et al.  Scale-up graph processing in the cloud: challenges and solutions , 2014, CloudDP '14.

[11]  Nancy M. Amato,et al.  Scaling Techniques for Massive Scale-Free Graphs in Distributed (External) Memory , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.