Breadth-First Search with A Multi-Core Computer

Breadth-first search is a building block of many graph algorithms. Because BFS is memory-bound, parallelizing BFS on a multi-core computer must consider issues of data hazards, effects of atomic operations on memory throughput, and the size of the last level cache. Additionally, graph algorithms must cope with non-sequential memory access, which defeats cache prefetching and leads to a high cache miss rate. This article describes how to limit the maximum size of the data structure, how to perform parallel BFS without atomic operations, how to increase the proportion of sequential memory access, and how to reduce cache contention. These techniques have been used in various forms in the literature. The present work puts them together in a simple way that works well. Three leading platforms of graph algorithms -- Gunrock, Ligra, and Polymer -- are used for comparison. When executed on the same machine, Ligra is the fastest among the three. The implementation described herein is always faster than Ligra, and is more than twice as fast for large graphs. In particular, for the graph RMat26, it is 3.11 times the speed of Ligra.

[1]  Guy E. Blelloch,et al.  Smaller and Faster: Parallel Processing of Compressed Graphs with Ligra+ , 2015, 2015 Data Compression Conference.

[2]  Katsuki Fujisawa,et al.  NUMA-optimized parallel breadth-first search on multicore single-node system , 2013, 2013 IEEE International Conference on Big Data.

[3]  Christos Faloutsos,et al.  R-MAT: A Recursive Model for Graph Mining , 2004, SDM.

[4]  Andrew S. Grimshaw,et al.  High-Performance and Scalable GPU Graph Traversal , 2015, ACM Trans. Parallel Comput..

[5]  Ilkka Norros,et al.  On the power-law random graph model of massive data networks , 2004, Perform. Evaluation.

[6]  Pradeep Dubey,et al.  Fast and Efficient Graph Traversal Algorithm for CPUs: Maximizing Single-Node Efficiency , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[7]  Matthieu Latapy,et al.  Fast computation of empirically tight bounds for the diameter of massive graphs , 2009, JEAL.

[8]  Ronald L. Rivest,et al.  Introduction to Algorithms, 3rd Edition , 2009 .

[9]  David A. Bader,et al.  Designing Multithreaded Algorithms for Breadth-First Search and st-connectivity on the Cray MTA-2 , 2006, 2006 International Conference on Parallel Processing (ICPP'06).

[10]  Andrei Z. Broder,et al.  Graph structure in the Web , 2000, Comput. Networks.

[11]  Charles E. Leiserson,et al.  A work-efficient parallel breadth-first search algorithm (or how to cope with the nondeterminism of reducers) , 2010, SPAA '10.

[12]  John D. Owens,et al.  Gunrock: a high-performance graph processing library on the GPU , 2015, PPoPP.

[13]  Guy E. Blelloch,et al.  Ligra: a lightweight graph processing framework for shared memory , 2013, PPoPP '13.

[14]  Michalis Faloutsos,et al.  On power-law relationships of the Internet topology , 1999, SIGCOMM '99.

[15]  Fabio Checconi,et al.  Traversing Trillions of Edges in Real Time: Graph Exploration on Large-Scale Parallel Machines , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[16]  Devang Shah,et al.  Implementing Lightweight Threads , 1992, USENIX Summer.

[17]  Jure Leskovec,et al.  Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters , 2008, Internet Math..

[18]  Scott McMillan,et al.  GBTL-CUDA: Graph Algorithms and Primitives for GPUs , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[19]  David A. Bader,et al.  Scalable Graph Exploration on Multicore Processors , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[20]  David A. Patterson,et al.  Direction-optimizing breadth-first search , 2012, HiPC 2012.

[21]  Kunle Olukotun,et al.  Efficient Parallel Graph Exploration on Multi-Core CPU and GPU , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[22]  Haibo Chen,et al.  NUMA-aware graph-structured analytics , 2015, PPoPP.

[23]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[24]  No License,et al.  Intel ® 64 and IA-32 Architectures Software Developer ’ s Manual Volume 3 A : System Programming Guide , Part 1 , 2006 .