Load Balancing on Message Passing Architectures

Many natural processes are best modeled by dynamic, Monte Carlo type algorithms. When parallelizing these, several problems emerge. One potential problem is a low overall efficiency due to an imbalanced work load. This paper describes the implementation of a testbed for load balancing techniques. This testbed is used for different static and dynamic strategies for balancing the work load of an iPSC/2 implementation of a simple simulation of population evolution. One of the new techniques described here is a decentralized direct method, which joins advantages of local and global strategies. In making comparisons between the different balancing methods, a clear separation was made between the work load (the algorithm solving a given problem, here a population simulation) and the balancer. The feasibility of this separation implies that the burden of developing an appropriate load balancer for a given algorithm may be removed from the programmer. The experience gained regarding load balancing for this class of problems will help guide the development of automated techniques for load balancing, either by an operating system or by a run-time system for a highlevel language.

[1]  James C. Browne,et al.  General approach to mapping of parallel computations upon multiprocessor architectures , 1988 .

[2]  Shahid H. Bokhari,et al.  A Partitioning Strategy for Nonuniform Problems on Multiprocessors , 1987, IEEE Transactions on Computers.

[3]  Scott B. Baden,et al.  Programming Abstractions for Run-Time Partitioning of Scientific Continuum Calculations Running on Multiprocessors , 1987, PPSC.

[4]  J. Ramanujam,et al.  Cluster partitioning approaches to mapping parallel programs onto a hypercube , 1987, Parallel Comput..

[5]  Geoffrey C. Fox,et al.  Domain Decomposition in Distributed and Shared Memory Environments. I: A Uniform Decomposition and Performance Analysis for the NCUBE and JPL Mark IIIfp Hypercubes , 1987, ICS.

[6]  Shahid H. Bokhari,et al.  On the Mapping Problem , 1981, IEEE Transactions on Computers.

[7]  David K. Bradley First and second generation hypercube performance , 1988 .

[8]  Tony F. Chan,et al.  SOLVING ELLIPTIC PARTIAL DIFFERENTIAL EQUATIONS ON HYPERCUBES. , 1986 .

[9]  Vivek Sarkar,et al.  Compile-time partitioning and scheduling of parallel programs , 1986, SIGPLAN '86.

[10]  Trevor Mudge,et al.  Monte Carlo Photon Transport On Shared Memory and Distributed Memory Parallel Processors , 1987 .

[11]  Prithviraj Banerjee,et al.  Recursive Partitions On Multiprocessor , 1990, Proceedings of the Fifth Distributed Memory Computing Conference, 1990..

[12]  Hironori Kasahara,et al.  Practical Multiprocessor Scheduling Algorithms for Efficient Parallel Processing , 1984, IEEE Transactions on Computers.

[13]  Karl SOLCHENBACH Grid applications on distributed memory architectures: Implementation and evaluation , 1988, Parallel Comput..

[14]  James R. Kenevan,et al.  Analytic Derivation of Processor Potential Utilization in Straight Line, Ring, Square Mesh, and Hypercube Networks , 1988, SIGMETRICS.

[15]  A. Tanenbaum Computer recreations , 1973 .

[16]  R. Overbeek,et al.  PARALLEL ADAPTIVE NUMERICAL SCHEMES FOR HYPERBOLIC SYSTEMS OF CONSERVATION LAWS* , 1987 .

[17]  S. McCormick,et al.  A multilevel variational method for Au= Bu on composite Grids , 1989 .

[18]  Satoshi Sekiguchi,et al.  Mapping schemes of the particle-in-cell method implemented on the PAX computer , 1988, Parallel Comput..

[19]  Laxmikant V. Kalé,et al.  Comparing the Performance of Two Dynamic Load Distribution Methods , 1988, ICPP.

[20]  Reinhard von Hanxleden,et al.  Correctness and determinism of Parallel Monte Carlo Processes , 1992, Parallel Comput..

[21]  Marina C. Chen,et al.  Automated Problem Mapping: the Crystal Runtime System. , 1987 .

[22]  D. W. Walker,et al.  Hierarchical Domain Decomposition With Unitary Load Balancing For Electromagnetic Particle-In-Cell Codes , 1990, Proceedings of the Fifth Distributed Memory Computing Conference, 1990..

[23]  Asser N. Tantawi,et al.  Optimal static load balancing in distributed computer systems , 1985, JACM.

[24]  Paul O. Frederickson,et al.  A parallel Monte Carlo transport algorithm using a pseudo-random tree to guarantee reproducibility , 1987, Parallel Comput..

[25]  Jake K. Aggarwal,et al.  A Mapping Strategy for Parallel Processing , 1987, IEEE Transactions on Computers.

[26]  Franz J. Kurfess,et al.  Parallel Inference Machines , 1986, Future Parallel Computers.

[27]  Tony F. Chan,et al.  Solving elliptic partial differential equations on the hypercube multiprocessor , 1987 .

[28]  J. Salmon,et al.  A mathematical analysis of the scattered decomposition , 1988, C3P.

[29]  Viktor K. Decyk,et al.  Dynamic Load Balancing in a Concurrent Plasma PIC Code on the JPL/Caltech Mark III Hypercube , 1990, Proceedings of the Fifth Distributed Memory Computing Conference, 1990..

[30]  Oliver A. McBryan,et al.  Hypercube Algorithms and Implementations , 1985, PPSC.

[31]  Kenichi Miura,et al.  Tradeoffs in granularity and parallelization for a Monte Carlo shower simulation code , 1988, Parallel Comput..

[32]  Daniel A. Reed,et al.  The impact of domain partitioning on the performance of a shared memory multiprocessor , 1987, Parallel Comput..