GraphCP: An I/O-Efficient Concurrent Graph Processing Framework

Big data applications increasingly rely on the analysis of large graphs. In order to analyze and process the large graphs with high cost efficiency, researchers have developed a number of out-of-core graph processing systems in recent years based on just one commodity computer. On the other hand, with the rapidly growing need of analyzing graphs in the real-world, graph processing systems have to efficiently handle massive concurrent graph processing (CGP) jobs. Unfortunately, due to the inherent design for single graph processing job, existing out-of-core graph processing systems usually incur redundant data accesses and storage and severe competition of I/O bandwidth when handling the CGP jobs, thus leading to very long waiting time experienced by users for the computing results. In this paper, we propose an I/O-efficient out-of-core graph processing system, GraphCP, to support the processing of CGP jobs. GraphCP proposes a benefit-aware sharing execution model that shares the I/O access and processing of graph data among the CGP jobs and adaptively schedules the loading of graph data, which efficiently overcomes above challenges faced by existing out-of-core graph processing systems. In addition, GraphCP organizes the graph data with a Source-Sorted Sub-Block graph representation for better processing capacity and I/O access locality. Extensive evaluation results show that GraphCP is 10.3x and 4.6x faster than two state-of-the-art out-of-core graph processing systems GridGraph and GraphZ respectively, and 2.1x faster than a CGP-oriented graph processing system Seraph.

[1]  Henry Hoffmann,et al.  GraphZ: Improving the Performance of Large-Scale Graph Analytics on Small-Scale Machines , 2018, 2018 IEEE 34th International Conference on Data Engineering (ICDE).

[2]  Chao Li,et al.  Congra: Towards Efficient Processing of Concurrent Graph Queries on Shared-Memory Machines , 2017, 2017 IEEE International Conference on Computer Design (ICCD).

[3]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[4]  Weimin Zheng,et al.  Squeezing out All the Value of Loaded Data: An Out-of-core Graph Processing System with Reduced Disk I/O , 2017, USENIX Annual Technical Conference.

[5]  Yafei Dai,et al.  Seraph: an efficient, low-cost system for concurrent graph processing , 2014, HPDC '14.

[6]  Zhenguo Li,et al.  VENUS: Vertex-centric streamlined graph computation on a single PC , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[7]  Keval Vora,et al.  LUMOS: Dependency-Driven Disk-based Graph Processing , 2019, USENIX Annual Technical Conference.

[8]  Khuzaima Daudjee,et al.  Giraph Unchained: Barrierless Asynchronous Parallel Execution in Pregel-like Graph Processing Systems , 2015, Proc. VLDB Endow..

[9]  Hai Jin,et al.  CGraph: A Correlations-aware Approach for Efficient Concurrent Iterative Graph Processing , 2018, USENIX Annual Technical Conference.

[10]  Dan Feng,et al.  A Hybrid Update Strategy for I/O-Efficient Out-of-Core Graph Processing , 2020, IEEE Transactions on Parallel and Distributed Systems.

[11]  Sebastiano Vigna,et al.  The webgraph framework I: compression techniques , 2004, WWW '04.

[12]  Guy E. Blelloch,et al.  GraphChi: Large-Scale Graph Computation on Just a PC , 2012, OSDI.

[13]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[14]  Yinglong Xia,et al.  C-Graph: A Highly Efficient Concurrent Graph Reachability Query Framework , 2018, ICPP.

[15]  Yafei Dai,et al.  Processing Concurrent Graph Analytics with Decoupled Computation Model , 2017, IEEE Transactions on Computers.

[16]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[17]  Binyu Zang,et al.  PowerLyra: Differentiated Graph Computation and Partitioning on Skewed Graphs , 2019, TOPC.

[18]  Wenguang Chen,et al.  GridGraph: Large-Scale Graph Processing on a Single Machine Using 2-Level Hierarchical Partitioning , 2015, USENIX Annual Technical Conference.

[19]  Rajiv Gupta,et al.  Load the Edges You Need: A Generic I/O Optimization for Disk-based Graph Processing , 2016, USENIX Annual Technical Conference.

[20]  Carlos Guestrin,et al.  Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .

[21]  Hong Jiang,et al.  LCC-Graph: A high-performance graph-processing framework with low communication costs , 2016, 2016 IEEE/ACM 24th International Symposium on Quality of Service (IWQoS).

[22]  Willy Zwaenepoel,et al.  X-Stream: edge-centric graph processing using streaming partitions , 2013, SOSP.

[23]  Alexander S. Szalay,et al.  FlashGraph: Processing Billion-Node Graphs on an Array of Commodity SSDs , 2014, FAST.

[24]  Jon M. Kleinberg,et al.  Group formation in large social networks: membership, growth, and evolution , 2006, KDD '06.

[25]  Bingsheng He,et al.  GraphM: an efficient storage system for high throughput of concurrent graph processing , 2019, SC.

[26]  Wenguang Chen,et al.  Gemini: A Computation-Centric Distributed Graph Processing System , 2016, OSDI.

[27]  Sebastiano Vigna,et al.  A large time-aware web graph , 2008, SIGF.