Parallel Graph Partitioning on a CPU-GPU Architecture

Graph partitioning has important applications in multiple areas of computing, including scheduling, social networks, and parallel processing. In recent years, GPUs have proven successful at accelerating several graph algorithms. However, the irregular nature of the real-world graphs poses a problem for GPUs, which favor regularity. In this paper, we discuss the design and implementation of a parallel multilevel graph partitioner for a CPU-GPU system. The partitioner aims to overcome some of the challenges arising due to memory constraints on GPUs and maximizes the utilization of GPU threads through suitable load-balancing schemes. We present a lock-free shared-memory scheme since fine-grained synchronization among thousands of threads imposes too high a performance overhead. The partitioner, implemented in CUDA, outperforms serial Metisand parallel MPI-based ParMetis. It performs similar to theshared-memory CPU-based parallel graph partitioner mt-metis.

[1]  Keshav Pingali,et al.  Optimistic parallelism requires abstractions , 2009, CACM.

[2]  Vipin Kumar,et al.  Parallel Multilevel series k-Way Partitioning Scheme for Irregular Graphs , 1999, SIAM Rev..

[3]  Gary L. Miller,et al.  A unified geometric approach to graph separators , 1991, [1991] Proceedings 32nd Annual Symposium of Foundations of Computer Science.

[4]  C. Walshaw JOSTLE : parallel multilevel graph-partitioning software – an overview , 2008 .

[5]  Bruce Hendrickson,et al.  A Multi-Level Algorithm For Partitioning Graphs , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[6]  Oscar H. Ibarra,et al.  Heuristic Algorithms for Scheduling Independent Tasks on Nonidentical Processors , 1977, JACM.

[7]  Peter Sanders,et al.  Recent Advances in Graph Partitioning , 2013, Algorithm Engineering.

[8]  Uwe Naumann,et al.  Combinatorial Scientific Computing , 2012 .

[9]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[10]  Chris Walshaw,et al.  Parallel optimisation algorithms for multilevel mesh partitioning , 2000, Parallel Comput..

[11]  Vipin Kumar,et al.  A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm , 1997, PP.

[12]  Jean Roman,et al.  SCOTCH: A Software Package for Static Mapping by Dual Recursive Bipartitioning of Process and Architecture Graphs , 1996, HPCN Europe.

[13]  François Pellegrini,et al.  PT-Scotch: A tool for efficient parallel graph ordering , 2008, Parallel Comput..

[14]  George Karypis,et al.  Introduction to Parallel Computing , 1994 .

[15]  Lie Wang,et al.  Towards a fast implementation of spectral nested dissection , 1992, Proceedings Supercomputing '92.

[16]  Steven Warren Hammond,et al.  Mapping unstructured grid computations to massively parallel computers , 1992 .

[17]  S.,et al.  An Efficient Heuristic Procedure for Partitioning Graphs , 2022 .

[18]  George Karypis,et al.  Multi-threaded Graph Partitioning , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[19]  Michael T. Goodrich,et al.  Education forum: Web Enhanced Textbooks , 1998, SIGA.

[20]  Keshav Pingali,et al.  Parallel Graph Partitioning on Multicore Architectures , 2010, LCPC.

[21]  Jun-Ho Her,et al.  Efficient and scalable parallel graph partitioning , 2008 .

[22]  Charles M. Fiduccia,et al.  A linear-time heuristic for improving network partitions , 1988, 25 years of DAC.

[23]  Chris Walshaw,et al.  JOSTLE: multilevel graph partitioning software: an overview , 2007 .