论文信息 - Pimiento: A Vertex-Centric Graph-Processing Framework on a Single Machine

Pimiento: A Vertex-Centric Graph-Processing Framework on a Single Machine

Here, we describe a method for handling large graphs with data sizes exceeding memory capacity using minimal hardware resources. This method (called Pimiento) is a vertex-centric graph-processing framework on a single machine and represents a semi-external graph-computing system, where all vertices are stored in memory, and all edges are stored externally in compressed sparse row data-storage format. Pimiento uses a multi-core CPU, memory, and multi-threaded data preprocessing to optimize disk I/O in order to reduce random-access overhead in the graph-algorithm implementation process. An on-the-fly update-accumulated mechanism was designed to reduce the time that the graph algorithm accesses disks during execution. Our experiments compared external this method with other graph-processing systems, including GraphChi, X-Stream, and FlashGraph, revealing that Pimiento achieved \(7.5\times , 4\times , 1.6\times \) better performance on large real-world graphs and synthetic graphs in the same experimental environment.

[1] Kisung Lee,et al. Fast Iterative Graph Computation: A Path Centric Approach , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[2] Yu Wang,et al. NXgraph: An efficient graph processing system on a single machine , 2015, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[3] Zhenguo Li,et al. VENUS: Vertex-centric streamlined graph computation on a single PC , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[4] Hai Jin,et al. FOG: A Fast Out-of-Core Graph Processing Framework , 2017, International Journal of Parallel Programming.

[5] Aart J. C. Bik,et al. Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[6] Joseph Gonzalez,et al. PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[7] Vivek S. Pai,et al. SSDAlloc: Hybrid SSD/RAM Memory Management Made Easy , 2011, NSDI.

[8] Wenguang Chen,et al. GridGraph: Large-Scale Graph Processing on a Single Machine Using 2-Level Hierarchical Partitioning , 2015, USENIX ATC.

[9] Sebastiano Vigna,et al. A large time-aware web graph , 2008, SIGF.

[10] Rajiv Gupta,et al. Load the Edges You Need: A Generic I/O Optimization for Disk-based Graph Processing , 2016, USENIX Annual Technical Conference.

[11] Joseph M. Hellerstein,et al. Distributed GraphLab: A Framework for Machine Learning in the Cloud , 2012, Proc. VLDB Endow..

[12] Nancy M. Amato,et al. Multithreaded Asynchronous Graph Traversal for In-Memory and Semi-External Memory , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[13] Hosung Park,et al. What is Twitter, a social network or a news media? , 2010, WWW '10.

[14] Willy Zwaenepoel,et al. X-Stream: edge-centric graph processing using streaming partitions , 2013, SOSP.

[15] Wenguang Chen,et al. Gemini: A Computation-Centric Distributed Graph Processing System , 2016, OSDI.

[16] Lixin Gao,et al. Maiter: An Asynchronous Graph Processing Framework for Delta-based Accumulative Iterative Computation , 2017, 1710.05785.

[17] Alexander S. Szalay,et al. FlashGraph: Processing Billion-Node Graphs on an Array of Commodity SSDs , 2014, FAST.

[18] Guy E. Blelloch,et al. GraphChi: Large-Scale Graph Computation on Just a PC , 2012, OSDI.

[19] Minsuk Kahng,et al. MMap: Fast billion-scale graph computation on a PC via memory mapping , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[20] Jinha Kim,et al. TurboGraph: a fast parallel graph engine handling billion-scale graphs in a single PC , 2013, KDD.