Balancing load versus decreasing communication: exploring the tradeoffs

We propose a domain decomposition scheme that seeks to minimize total parallel execution time by considering the relative importance of two competing concerns-balancing the load and minimizing communication for a particular application and architecture. A simulated annealing approach is used to optimize an objective function with components that measure both load balance and communication requirements. We develop an analytical model of execution time based upon a finite element code executed on the Intel Paragon. This model is used to compare partitions with varying degrees of load imbalance. Most literature in the area of decomposition methods heavily emphasizes load balancing over the minimization of communication. Our results indicate that this restrictive approach to load balancing can be relaxed without performance degradation. Further, our results indicate that the degree of relaxation possible is dependent upon the target machine and the application; neither one can be neglected.

[1]  Bruce Hendrickson,et al.  The Chaco user`s guide. Version 1.0 , 1993 .

[2]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[3]  Alex Pothen,et al.  PARTITIONING SPARSE MATRICES WITH EIGENVECTORS OF GRAPHS* , 1990 .

[4]  Charbel Farhat On the mapping of massively parallel processors onto finite element graphs , 1989 .

[5]  Edward W. Felten,et al.  Large-Step Markov Chains for the Traveling Salesman Problem , 1991, Complex Syst..

[6]  Marcin Paprzycki,et al.  Parallel computing works! , 1996, IEEE Parallel & Distributed Technology: Systems & Applications.

[7]  Olivier C. Martin,et al.  Partitioning of unstructured meshes for load balancing , 1995, Concurr. Pract. Exp..

[8]  Brian W. Kernighan,et al.  An efficient heuristic procedure for partitioning graphs , 1970, Bell Syst. Tech. J..

[9]  Ian Foster,et al.  Designing and building parallel programs , 1994 .

[10]  Shahid H. Bokhari,et al.  A Partitioning Strategy for Nonuniform Problems on Multiprocessors , 1987, IEEE Transactions on Computers.

[11]  Horst D. Simon,et al.  Partitioning of unstructured problems for parallel processing , 1991 .

[12]  B. Nour-Omid,et al.  Solving finite element equations on concurrent computers , 1987 .

[13]  T. Belytschko,et al.  Efficient large scale non‐linear transient analysis by finite elements , 1976 .

[14]  Charbel Farhat,et al.  A retrofit based methodology for the fast generation and optimization of large-scale mesh partitions: Beyond the minimum interface size criterion , 1996 .

[15]  Ted Belytschko,et al.  Finite element analysis on the connection machine , 1990 .

[16]  Horst D. Simon,et al.  Fast multilevel implementation of recursive spectral bisection for partitioning unstructured problems , 1994, Concurr. Pract. Exp..