TOPOLOGICALLY ADAPTIVE PARALLEL BREADTH-FIRST SEARCH ON MULTICORE PROCESSORS

Breadth-first Search (BFS) is a fundamental graph theory algorithm that is extensively used to abstract various challenging computational problems. Due to the fine-grained irregular memory accesses, parallelization of BFS can exhibit limited performance on cache-based systems. In this paper, we study the relationship between the topology of input graphs and the performance of BFS on multicore systems. We propose a model to estimate the scalability of BFS with respect to a given graph. Using this model, we propose a topologically adaptive parallel BFS algorithm on multicore systems. The proposed algorithm estimates scalability of each iteration of BFS with respect to the input graph at runtime. An adaptive barrier is developed for this algorithm, which dynamically adjusts the number of threads participating in the BFS according to the estimated scalability. In this way, we reduce the synchronization overhead. We evaluate the proposed algorithm using various graphs on state-of-the-art multicore systems. The proposed method exhibits improved performance compared with traditional parallel BFS algorithms for which the number of threads is fixed.

[1]  Kam-Hoi Cheng,et al.  A fast graph search multiprocessor algorithm , 1997, Proceedings of the IEEE 1997 National Aerospace and Electronics Conference. NAECON 1997.

[2]  Guy E. Blelloch,et al.  An Experimental Analysis of a Compact Graph Representation , 2004, ALENEX/ANALC.

[3]  Éva Tardos,et al.  Algorithm design , 2005 .

[4]  Edmond Chow,et al.  A Scalable Distributed Parallel Breadth-First Search Algorithm on BlueGene/L , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[5]  David A. Bader,et al.  Designing Multithreaded Algorithms for Breadth-First Search and st-connectivity on the Cray MTA-2 , 2006, 2006 International Conference on Parallel Processing (ICPP'06).

[6]  Nachiket Kapre,et al.  GraphStep: A System Architecture for Sparse-Graph Algorithms , 2006, 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[7]  S. Lennart Johnsson,et al.  Scheduling FFT computation on SMP and multicore systems , 2007, ICS '07.

[8]  David A. Bader,et al.  A Graph-Theoretic Analysis of the Human Protein-Interaction Network Using Multicore Parallel Algorithms , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[9]  Fabrizio Petrini,et al.  Efficient Breadth-First Search on the Cell/BE Processor , 2008, IEEE Transactions on Parallel and Distributed Systems.

[10]  Joseph Gonzalez,et al.  Residual Splash for Optimally Parallelizing Belief Propagation , 2009, AISTATS.