论文信息 - Extreme scale breadth-first search on supercomputers

Extreme scale breadth-first search on supercomputers

Breadth-First Search(BFS) is one of the most fundamental graph algorithms used as a component of many graph algorithms. Our new method for distributed parallel BFS can compute BFS for one trillion vertices graph within half a second, using large supercomputers such as the K-Computer. By the use of our proposed algorithm, the K-Computer was ranked 1st in Graph500 using all the 82,944 nodes available on June and November 2015 and June 2016 38,621.4 GTEPS. Based on the hybrid-BFS algorithm by Beamer[3], we devise sets of optimizations for scaling to extreme number of nodes, including a new efficient graph data structure and optimization techniques such as vertex reordering and load balancing. Performance evaluation on the K shows our new BFS is 3.19 times faster on 30,720 nodes than the base version using the previously-known best techniques.

Satoshi Matsuoka | Koji Ueno | Toyotaro Suzumura | Katsuki Fujisawa | Naoya Maruyama

[1] Koji Ueno,et al. Parallel distributed breadth first search on GPU , 2013, 20th Annual International Conference on High Performance Computing.

[2] Koji Ueno,et al. Highly scalable graph search for the Graph500 benchmark , 2012, HPDC '12.

[3] David A. Patterson,et al. Direction-optimizing breadth-first search , 2012, HiPC 2012.

[4] Christos Faloutsos,et al. Realistic, Mathematically Tractable Graph Generation and Evolution, Using Kronecker Multiplication , 2005, PKDD.

[5] Edmond Chow,et al. A Scalable Distributed Parallel Breadth-First Search Algorithm on BlueGene/L , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[6] Katsuki Fujisawa,et al. Fast and Energy-efficient Breadth-First Search on a Single NUMA System , 2014, ISC.

[7] David A. Patterson,et al. Distributed Memory Breadth-First Search Revisited: Enabling Bottom-Up Search , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.

[8] Fabio Checconi,et al. Traversing Trillions of Edges in Real Time: Graph Exploration on Large-Scale Parallel Machines , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[9] Pradeep Dubey,et al. Large-scale energy-efficient graph traversal: A path to efficient data-intensive supercomputing , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[10] John R. Gilbert,et al. On the representation and multiplication of hypersparse matrices , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[11] Eurípides Montagne,et al. An optimal storage format for sparse matrices , 2004, Inf. Process. Lett..

[12] Tomohiro Inoue,et al. The Tofu Interconnect , 2011, 2011 IEEE 19th Annual Symposium on High Performance Interconnects.

[13] Fabio Checconi,et al. Breaking the speed and scalability Barriers for Graph exploration on distributed-memory machines , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[14] David A. Patterson,et al. Direction-optimizing Breadth-First Search , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[15] Satoshi Matsuoka,et al. Performance characteristics of Graph500 on large-scale distributed environment , 2011, 2011 IEEE International Symposium on Workload Characterization (IISWC).

[16] Kamesh Madduri,et al. Parallel breadth-first search on distributed memory systems , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[17] Tomohiro Inoue,et al. The Tofu Interconnect , 2012, IEEE Micro.

[18] Fumiyoshi Shoji,et al. The K computer: Japanese next-generation supercomputer development project , 2011, IEEE/ACM International Symposium on Low Power Electronics and Design.