Towards Modeling the Performance of a Fast Connected Components Algorithm on Parallel Machines

We present and analyze a portable, high-performance algorithm for nding connected components on modern distributed memory multiprocessors. The algorithm is a hybrid of the classic DFS on the subgraph local to each processor and a variant of the Shiloach-Vishkin PRAM algorithm on the global collection of subgraphs. We implement the algorithm in Split-C and measure performance on the the Cray T3D, the Meiko CS-2, and the Thinking Machines CM-5 using a class of graphs derived from cluster dynamics methods in computational physics. On a 256 processor Cray T3D, the implementation outperforms all previous solutions by an order of magnitude. A characterization of graph parameters allows us to select graphs that highlight key performance features. We study the eeects of these parameters and machine characteristics on the balance of time between the local and global phases of the algorithm and nd that edge density, surface-to-volume ratio, and relative communication cost dominate performance. By understanding the eeect of machine characteristics on performance, the study sheds light on the impact of improvements in computational and/or communication performance on this challenging problem.

[1]  Donald B. Johnson,et al.  Connected Components in O (log^3/2 n) Parallel Time for the CREW PRAM , 1997, J. Comput. Syst. Sci..

[2]  Hillel Gazit,et al.  An optimal randomized parallel algorithm for finding connected components in a graph , 1986, 27th Annual Symposium on Foundations of Computer Science (sfcs 1986).

[3]  R. Hackl,et al.  PARALLELIZATION OF THE 2D SWENDSEN–WANG ALGORITHM , 1993 .

[4]  Tak Wah Lam,et al.  Finding connected components in O(log n loglog n) time on the EREW PRAM , 1993, SODA '93.

[5]  Noam Nisan,et al.  Fast connected components algorithms for the EREW PRAM , 1992, SPAA '92.

[6]  Steve Goddard,et al.  Connected components algorithms for mesh-connected parallel computers , 1994, Parallel Algorithms.

[7]  David E. Culler,et al.  Connected components on distributed memory machines , 1994, Parallel Algorithms.

[8]  Remzi H. Arpaci-Dusseau,et al.  Empirical evaluation of the CRAY-T3D: a compiler perspective , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[9]  Susanne E. Hambrusch,et al.  A Study of Connected Component Labeling Algorithms on the MPP , 1988 .

[10]  Y. J. Tejwani,et al.  Robot vision , 1989, IEEE International Symposium on Circuits and Systems,.

[11]  Hidetoshi Mino,et al.  A vectorized algorithm for cluster formation in the Swendsen-Wang dynamics , 1991 .

[12]  Pablo Tamayo,et al.  a Parallel Cluster Labeling Method for Monte Carlo Dynamics , 1992 .

[13]  October I Physical Review Letters , 2022 .

[14]  Chris J. Scheiman,et al.  Experience with active messages on the Meiko CS-2 , 1995, Proceedings of 9th International Parallel Processing Symposium.

[15]  Akademii︠a︡ medit︠s︡inskikh nauk Sssr Journal of physics , 1939 .

[16]  Anoop Gupta,et al.  Scaling parallel programs for multiprocessors: methodology and examples , 1993, Computer.

[17]  John Greiner,et al.  A comparison of parallel algorithms for connected components , 1994, SPAA '94.

[18]  Wang,et al.  Nonuniversal critical dynamics in Monte Carlo simulations. , 1987, Physical review letters.

[19]  Uzi Vishkin,et al.  An O(log n) Parallel Connectivity Algorithm , 1982, J. Algorithms.

[20]  Rajeev Thakur,et al.  Connected Component Labeling on Coarse Grain Parallel Computers: An Experimental Study , 1994, J. Parallel Distributed Comput..

[21]  Dilip V. Sarwate,et al.  Computing connected components on parallel computers , 1979, CACM.