Computing Strongly Connected Components in Parallel on CUDA

The problem of decomposing a directed graph into its strongly connected components is a fundamental graph problem inherently present in many scientific and commercial applications. In this paper we show how some of the existing parallel algorithms can be reformulated in order to be accelerated by NVIDIA CUDA technology. In particular, we design a new CUDA-aware procedure for pivot selection and we adapt selected parallel algorithms for CUDA accelerated computation. We also experimentally demonstrate that with a single GTX 480 GPU card we can easily outperform the optimal serial CPU implementation by an order of magnitude in most cases, 40 times on some sufficiently big instances. This is an interesting result as unlike the serial CPU case, the asymptotic complexity of the parallel algorithms is not optimal.

[1]  Lubos Brim,et al.  Cluster-Based LTL Model Checking of Large Systems , 2005, FMCO.

[2]  Gary L. Miller,et al.  An Improved Parallel Algorithm that Computes the BFS Numbering of a Directed Graph , 1988, Information Processing Letters.

[3]  David A. Bader,et al.  Design and Implementation of the HPCS Graph Analysis Benchmark on Symmetric Multiprocessors , 2005, HiPC.

[4]  P. J. Narayanan,et al.  Accelerating Large Graph Algorithms on the GPU Using CUDA , 2007, HiPC.

[5]  Jaco van de Pol,et al.  Improved Distributed Algorithms for SCC Decomposition , 2008, PDMC@CAV.

[6]  Kathi Fisler,et al.  Is There a Best Symbolic Cycle-Detection Algorithm? , 2001, TACAS.

[7]  Lawrence Rauchwerger,et al.  Identifying Strongly Connected Components in Parallel , 2000, PPSC.

[8]  Simona Orzan,et al.  On Distributed Verification and Verified Distribution , 2004 .

[9]  John H. Reif,et al.  Depth-First Search is Inherently Sequential , 1985, Inf. Process. Lett..

[10]  Jiri Barnat,et al.  Parallel Algorithms for Finding SCCs in Implicitly Given Graphs , 2006, FMICS/PDMC.

[11]  Sandeep Koranne,et al.  Boost C++ Libraries , 2011 .

[12]  P. J. Narayanan,et al.  Large Graph Algorithms for Massively Multithreaded Architectures , 2009 .

[13]  Christos Faloutsos,et al.  R-MAT: A Recursive Model for Graph Mining , 2004, SDM.

[14]  Robert E. Tarjan,et al.  Depth-First Search and Linear Graph Algorithms , 1972, SIAM J. Comput..

[15]  Jaco van de Pol,et al.  Distributed Algorithms for SCC Decomposition , 2011, J. Log. Comput..

[16]  David A. Bader,et al.  GTgraph : A Synthetic Graph Generator Suite , 2006 .

[17]  Nancy M. Amato Improved Processor Bounds for Parallel Algorithms for Weighted Directed Graphs , 1993, Inf. Process. Lett..

[18]  Lawrence Rauchwerger,et al.  Finding strongly connected components in distributed graphs , 2005, J. Parallel Distributed Comput..

[19]  Warren Schudy,et al.  Finding strongly connected components in parallel using O(log2n) reachability queries , 2008, SPAA '08.

[20]  Richard Cole,et al.  Faster Optimal Parallel Prefix Sums and List Ranking , 1989, Inf. Comput..

[21]  Lubos Brim,et al.  Employing Multiple CUDA Devices to Accelerate LTL Model Checking , 2010, 2010 IEEE 16th International Conference on Parallel and Distributed Systems.

[22]  Lubos Brim,et al.  Scalable Multi-core LTL Model-Checking , 2007, SPIN.

[23]  Lubos Brim,et al.  DiVinE - A Tool for Distributed Verification , 2006, CAV.

[24]  Guy E. Blelloch,et al.  Vector Models for Data-Parallel Computing , 1990 .