A Malleable-Job System for Timeshared Parallel Machines

Malleable jobs are parallel programs that can change the number of processors on which they are executing at run time in response to an external command. One of the advantages of such jobs is that a job scheduler for malleable jobs can provide improved system utilization and average response time over a scheduler for traditional jobs. In this paper, we present a programming system for creating malleable jobs that is more general than other current malleable systems. In particular, it is not limited to the master-worker paradigm or the Fortran SPMD programming model, but can also support general purpose parallel programs including those written in MPI and Charm++, and has built-in migration and load-balancing, among other features.

[1]  Larry Rudolph,et al.  Towards Convergence in Job Schedulers for Parallel Supercomputers , 1996, JSSPP.

[2]  Margaret Martonosi,et al.  Adaptive parallelism in compiler‐parallelized code , 1998 .

[3]  Klaus Jansen,et al.  Linear-Time Approximation Schemes for Scheduling Malleable Parallel Tasks , 1999, SODA '99.

[4]  Sotiris Ioannidis,et al.  CRAUL: Compiler and run-time integration for adaptation under load , 1999, Sci. Program..

[5]  Margaret Martonosi,et al.  Adaptive parallelism in compiler-parallelized code , 1998, Concurr. Pract. Exp..

[6]  Michael Litzkow,et al.  Supporting checkpointing and process migration outside the UNIX kernel , 1999 .

[7]  Mary K. Vernon,et al.  Dynamic vs. Static Quantum-Based Parallel Processor Allocation , 1996, JSSPP.

[8]  L.V. Kale,et al.  Modeling biomolecules: larger scales, longer durations , 1994, IEEE Computational Science and Engineering.

[9]  José E. Moreira,et al.  Dynamic resource management on distributed systems using reconfigurable applications , 1997, IBM J. Res. Dev..

[10]  Laxmikant V. Kalé,et al.  CHARM++: a portable concurrent object oriented system based on C++ , 1993, OOPSLA '93.

[11]  Anoop Gupta,et al.  Process control and scheduling issues for multiprogrammed shared-memory multiprocessors , 1989, SOSP '89.

[12]  Erik Seligman,et al.  Dome: parallel programming in a distributed computing environment , 1996, Proceedings of International Conference on Parallel Processing.

[13]  Laxmikant V. Kale,et al.  Object-Based Adaptive Load Balancing for MPI Programs∗ , 2000 .

[14]  Miron Livny,et al.  Parallel Processing on Dynamic Resources with CARMI , 1995, JSSPP.

[15]  Thu D. Nguyen,et al.  Using Runtime Measured Workload Characteristics in Parallel Processor Scheduling , 1996, JSSPP.

[16]  Mark S. Squillante,et al.  Dynamic Partitioning in Different Distributed-Memory Environments , 1996, JSSPP.

[17]  Sotiris Ioannidis,et al.  CRAULc Compiler and run-time integration for adaptation under load[1]This work was supported in part by NSF grants CDA-9401142, CCR-9702466, and CCR-9705594s and an external research grant from Compaq. , 1999 .

[18]  Dror G. Feitelson,et al.  Utilization, Predictability, Workloads, and User Runtime Estimates in Scheduling the IBM SP2 with Backfilling , 2001, IEEE Trans. Parallel Distributed Syst..

[19]  Joel H. Saltz,et al.  Data parallel programming in an adaptive environment , 1995, Proceedings of 9th International Parallel Processing Symposium.

[20]  Jitendra Padhye,et al.  Dynamic versus Adaptive Processor Allocation Policies for Message Passing Parallel Computers: An Empirical Comparison , 1996, JSSPP.

[21]  Jong Kim,et al.  On-line scheduling of scalable real-time tasks on multiprocessor systems , 2003, J. Parallel Distributed Comput..

[22]  Laxmikant V. Kalé,et al.  Converse: an interoperable framework for parallel programming , 1996, Proceedings of International Conference on Parallel Processing.

[23]  Laxmikant V. Kalé,et al.  Adaptive Load Balancing for MPI Programs , 2001, International Conference on Computational Science.