Experiments in Dynamic Load Balancing for Parallel Cluster Computing

In academic and industrial institutions, a shift of emphasis in High Performance Computing from parallel monolithes to clusters of high performance workstations is taking place. These loosely coupled parallel systems require new programming paradigms and environments that provide the user with tools to explore the full potential of the available distributed resources. Although such cluster computing systems provide the user with large amounts of processing power, their usability and efficiency is mainly determined by environmental changes like variation in the demand for processing power and the varying number of available processors. To optimize the resource utilization under these environmental changes it is necessary to migrate running tasks between processors, i.e., to perform dynamic load balancing. We introduced a scheduling mechanism in PVM that supports such load balancing for parallel tasks running on loosely coupled parallel systems. The enhanced system is called DynamicPVM. Our primary objective is to study models describing adaptive systems like DynamicPVM. To validate these models, experiments with actual implementations of such dynamic systems are required. The work presented here reports on a pilot implementation of DynamicPVM. The choice for PVM [5] as the basic parallel programming environment is motivated by the fact that PVM is the most widely used environment to date and is considered the de facto standard. The process migration primitives used in DynamicPVM were initially based on the checkpoint-restart mechanisms found in a well established global scheduling system, Condor [3] but have been replaced our own routines order support our pool of Solaris workstations and to reduce checkpoint overhead. Table 1 shows different aspects of load managing for the three systems discussed in this paper. We use the term job to indicate the largest entity of execution (program) consisting of one (serial program) or more cooperating tasks (parallel program).

[1]  Peter M. A. Sloot,et al.  DynamicPVM - Dynamic Load Balancing on Parallel Systems , 1994, HPCN.

[2]  Leen Dikken,et al.  DynamicPVM: Task Migration in PVM , 1993 .

[3]  Miron Livny,et al.  Condor-a hunter of idle workstations , 1988, [1988] Proceedings. The 8th International Conference on Distributed.

[4]  Vaidy S. Sunderam,et al.  PVM: A Framework for Parallel Distributed Computing , 1990, Concurr. Pract. Exp..

[5]  D. B. Davis,et al.  Sun Microsystems Inc. , 1993 .

[6]  Raj Srinivasan,et al.  XDR: External Data Representation Standard , 1995, RFC.