Communication-avoiding parallel minimum cuts and connected components

We present novel scalable parallel algorithms for finding global minimum cuts and connected components, which are important and fundamental problems in graph processing. To take advantage of future massively parallel architectures, our algorithms are communication-avoiding: they reduce the costs of communication across the network and the cache hierarchy. The fundamental technique underlying our work is the randomized sparsification of a graph: removing a fraction of graph edges, deriving a solution for such a sparsified graph, and using the result to obtain a solution for the original input. We design and implement sparsification with O(1) synchronization steps. Our global minimum cut algorithm decreases communication costs and computation compared to the state-of-the-art, while our connected components algorithm incurs few cache misses and synchronization steps. We validate our approach by evaluating MPI implementations of the algorithms on a petascale supercomputer. We also provide an approximate variant of the minimum cut algorithm and show that it approximates the exact solutions well while using a fraction of cores in a fraction of time.

[1]  Eric V. Denardo,et al.  Flows in Networks , 2011 .

[2]  Barbara Geissmann,et al.  Cache Oblivious Minimum Cut , 2017, CIAC.

[3]  Pedro Gómez Vilda,et al.  An improved watershed algorithm based on efficient computation of shortest paths , 2007, Pattern Recognit..

[4]  Mark Hoemmen,et al.  Communication-avoiding Krylov subspace methods , 2010 .

[5]  Dieter Kranzlmüller,et al.  Trends in Computation, Communication and Storage and the Consequences for Data-intensive Science , 2012, 2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems.

[6]  Uri Zwick,et al.  Optimal randomized EREW PRAM algorithms for finding spanning forests and for other basic graph connectivity problems , 1996, SODA '96.

[7]  Charles E. Leiserson,et al.  Cache-Oblivious Algorithms , 2003, CIAC.

[8]  Ben H. H. Juurlink,et al.  Communication-optimal parallel minimum spanning tree algorithms (extended abstract) , 1998, SPAA '98.

[9]  Volker Strassen,et al.  A Fast Monte-Carlo Test for Primality , 1977, SIAM J. Comput..

[10]  Francesco Silvestri,et al.  On the Limits of Cache-Oblivious Matrix Transposition , 2006, TGC.

[11]  James B. Orlin,et al.  A faster algorithm for finding the minimum cut in a graph , 1992, SODA '92.

[12]  Jeremy G. Siek,et al.  The Boost Graph Library - User Guide and Reference Manual , 2001, C++ in-depth series.

[13]  Keshav Pingali,et al.  A lightweight infrastructure for graph analytics , 2013, SOSP.

[14]  S. E. Schaeffer Survey Graph clustering , 2007 .

[15]  Torsten Hoefler,et al.  Scientific Benchmarking of Parallel Computing Systems Twelve ways to tell the masses when reporting performance results , 2017 .

[16]  Steve Goddard,et al.  Connected components algorithms for mesh-connected parallel computers , 1994, Parallel Algorithms.

[17]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[18]  Jeffery R. Westbrook,et al.  A Functional Approach to External Graph Algorithms , 1998, Algorithmica.

[19]  Andrew D. Wilson Robust computer vision-based detection of pinching for one and two-handed gesture input , 2006, UIST.

[20]  Mark A. Moraes,et al.  Parallel random numbers: As easy as 1, 2, 3 , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[21]  Jayaram K. Udupa,et al.  Boundary and object labelling in three-dimensional images , 1990, Comput. Vis. Graph. Image Process..

[22]  Douglas P. Gregor,et al.  The Parallel BGL : A Generic Library for Distributed Graph Computations , 2005 .

[23]  David R. Karger,et al.  A new approach to the minimum cut problem , 1996, JACM.

[24]  Pen-Chung Yew,et al.  The impact of synchronization and granularity on parallel systems , 1990, ISCA '90.

[25]  Toshihide Ibaraki,et al.  Computing Edge-Connectivity in Multigraphs and Capacitated Graphs , 1992, SIAM J. Discret. Math..

[26]  Torsten Hoefler,et al.  Toward Performance Models of MPI Implementations for Understanding Application Scaling Issues , 2010, EuroMPI.

[27]  David A. Patterson,et al.  Latency lags bandwith , 2004, CACM.

[28]  Afonso Ferreira,et al.  Efficient Parallel Graph Algorithms for Coarse-Grained Multicomputers and BSP , 2002, Algorithmica.

[29]  David F. Gleich,et al.  A parallel min-cut algorithm using iteratively reweighted least squares targeting at problems with floating-point edge weights , 2015, Parallel Comput..

[30]  John H. Reif,et al.  Polling: a new randomized sampling technique for computational geometry , 1989, STOC '89.

[31]  Jack J. Dongarra,et al.  Performance analysis of MPI collective operations , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[32]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[33]  Mechthild Stoer,et al.  A simple min-cut algorithm , 1997, JACM.

[34]  Philip N. Klein,et al.  A randomized linear-time algorithm to find minimum spanning trees , 1995, JACM.

[35]  R. Sharan,et al.  CLICK: a clustering algorithm with applications to gene expression analysis. , 2000, Proceedings. International Conference on Intelligent Systems for Molecular Biology.

[36]  Lorenzo Donatiello,et al.  Performance Evaluation of Computer and Communication Systems , 1993, Lecture Notes in Computer Science.

[37]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[38]  David R. Karger,et al.  A randomized fully polynomial time approximation scheme for the all terminal network reliability problem , 1995, STOC '95.

[39]  Friedhelm Meyer auf der Heide,et al.  Communication-Efficient Parallel Multiway and Approximate Minimum Cut Computation , 1998, LATIN.

[40]  Alan M. Frieze,et al.  Random graphs , 2006, SODA '06.

[41]  Michael Brinkmeier,et al.  A Simple and Fast Min-Cut Algorithm , 2005, Theory of Computing Systems.

[42]  David R. Karger,et al.  Minimum cuts in near-linear time , 1998, JACM.

[43]  Christos Faloutsos,et al.  R-MAT: A Recursive Model for Graph Mining , 2004, SDM.

[44]  Jack Dongarra,et al.  MPI: The Complete Reference , 1996 .

[45]  William J. Dally,et al.  Technology-Driven, Highly-Scalable Dragonfly Topology , 2008, 2008 International Symposium on Computer Architecture.

[46]  Anil K. Jain,et al.  Text information extraction in images and video: a survey , 2004, Pattern Recognit..

[47]  Jonathan W. Berry,et al.  Challenges in Parallel Graph Processing , 2007, Parallel Process. Lett..

[48]  Roded Sharan,et al.  Center CLICK: A Clustering Algorithm with Applications to Gene Expression Analysis , 2000, ISMB.