XTree: Traversal-Based Partitioning for Extreme-Scale Graph Processing on Supercomputers

Graph algorithms, such as Breadth First Search (BFS), Single Source Shortest Path (SSSP), PageRank (PR), and Connected Components (CC), are increasingly important in big data processing and analytics. As graph scales (numbers of vertices and edges) have increased from billions to trillions, Supercomputers have huge numbers (up to hundreds of thousands) of computing nodes (CNs) that can provide ultra-high aggregate computing power and memory capacity, thus being particularly suitable for processing extreme-scale graphs with trillions of vertices and edges. However, existing cluster-based graph-parallel systems perform poorly when deployed on supercomputers, since their partitioning methods overlook the hierarchical nature of supercomputer networks and incur prohibitive communication storm. This paper presents XTree, an efficient traversal-based partitioning method for minimizing communication overhead of graph processing on supercomputers. We observe that supercomputers' huge numbers of CNs are usually organized into hierarchical communication domains, which can be modeled as a domain tree where communication in lower-level domains is significantly faster than that in higher-level ones. Therefore, the key idea of XTree's partitioning is to exploit hierarchical locality by viewing the graph as a BFS tree and leveraging the topology knowledge to map the graph's BFS tree onto the domain tree, We evaluate the effectiveness of XTree by running various graph algorithms, on both real-world big graphs and synthetic trillion-scale graphs. XTree substantially reduces communication overhead and achieves orders of magnitude speedup against the Graph500 reference implementations with the state-of-the-art 2D-decomposition partitioning.

[1]  Xinbiao Gan,et al.  TianheGraph: Customizing Graph Search for Graph500 on Tianhe Supercomputer , 2021, IEEE Transactions on Parallel and Distributed Systems.

[2]  Alex Brooks,et al.  Gluon: a communication-optimizing substrate for distributed heterogeneous graph analytics , 2018, PLDI.

[3]  Zhijia Zhao,et al.  Tigr: Transforming Irregular Graphs for GPU-Friendly Graph Processing , 2018, ASPLOS.

[4]  Christoforos E. Kozyrakis,et al.  Making pull-based graph processing performant , 2018, PPoPP.

[5]  Mateo Valero,et al.  A scalable synthetic traffic model of Graph500 for computer networks analysis , 2017, Concurr. Comput. Pract. Exp..

[6]  Ling Liu,et al.  GraphA: Efficient Partitioning and Storage for Distributed Graph Computation , 2017, IEEE Transactions on Services Computing.

[7]  Weimin Zheng,et al.  Squeezing out All the Value of Loaded Data: An Out-of-core Graph Processing System with Reduced Disk I/O , 2017, USENIX Annual Technical Conference.

[8]  Yafei Dai,et al.  Garaph: Efficient GPU-accelerated Graph Processing on a Single Machine with Balanced Replication , 2017, USENIX Annual Technical Conference.

[9]  Mario Szegedy,et al.  A Simple Yet Effective Balanced Edge Partition Model for Parallel Computing , 2017, SIGMETRICS 2017.

[10]  Yinghui Wu,et al.  Parallelizing Sequential Graph Computations , 2017, SIGMOD Conference.

[11]  Wei Li,et al.  Tux2: Distributed Graph Computation for Machine Learning , 2017, NSDI.

[12]  Wenguang Chen,et al.  Gemini: A Computation-Centric Distributed Graph Processing System , 2016, OSDI.

[13]  Weimin Zheng,et al.  Exploring the Hidden Dimension in Graph Processing , 2016, OSDI.

[14]  Haibo Chen,et al.  Fast and Concurrent RDF Queries with RDMA-Based Distributed Graph Exploration , 2016, OSDI.

[15]  Kang G. Shin,et al.  Version Traveler: Fast and Memory-Efficient Version Switching in Graph Processing Systems , 2016, USENIX Annual Technical Conference.

[16]  Daniel J. Abadi,et al.  LEOPARD: Lightweight Edge-Oriented Partitioning and Replication for Dynamic Graphs , 2016, Proc. VLDB Endow..

[17]  Cheng Luo,et al.  Applying high-performance computing in drug discovery and molecular simulation , 2016, National science review.

[18]  Emin Gün Sirer,et al.  Weaver: A High-Performance, Transactional Graph Database Based on Refinable Timestamps , 2015, Proc. VLDB Endow..

[19]  Rong Chen,et al.  PowerLyra: differentiated graph computation and partitioning on skewed graphs , 2015, EuroSys.

[20]  Reynold Xin,et al.  GraphX: Graph Processing in a Distributed Dataflow Framework , 2014, OSDI.

[21]  Ning Xu,et al.  LogGP: A Log-based Dynamic Graph Partitioning Method , 2014, Proc. VLDB Endow..

[22]  Tao Gao,et al.  Using the Intel Many Integrated Core to accelerate graph traversal , 2014, Int. J. High Perform. Comput. Appl..

[23]  Amol Deshpande,et al.  EAGr: supporting continuous ego-centric aggregate queries over large dynamic graphs , 2014, SIGMOD Conference.

[24]  Joel Nishimura,et al.  Restreaming graph partitioning: simple versatile algorithms for advanced balancing , 2013, KDD.

[25]  Guy E. Blelloch,et al.  Ligra: a lightweight graph processing framework for shared memory , 2013, PPoPP '13.

[26]  Carlos Guestrin,et al.  Usenix Association 10th Usenix Symposium on Operating Systems Design and Implementation (osdi '12) 31 Graphchi: Large-scale Graph Computation on Just a Pc , 2022 .

[27]  Bo Zong,et al.  Towards effective partition management for large graphs , 2012, SIGMOD Conference.

[28]  Joseph M. Hellerstein,et al.  Distributed GraphLab: A Framework for Machine Learning in the Cloud , 2012, Proc. VLDB Endow..

[29]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[30]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[31]  Christos Faloutsos,et al.  Kronecker Graphs: An Approach to Modeling Networks , 2008, J. Mach. Learn. Res..

[32]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[33]  Robert E. Tarjan,et al.  Depth-First Search and Linear Graph Algorithms , 1972, SIAM J. Comput..

[34]  Carlos Guestrin,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012 .