Modeling and analysis of dynamic coscheduling in parallel and distributed environments

Scheduling in large-scale parallel systems has been and continues to be an important and challenging research problem. Several key factors, including the increasing use of off-the-shelf clusters of workstations to build such parallel systems, have resulted in the emergence of a new class of scheduling strategies, broadly referred to as dynamic coscheduling. Unfortunately, the size of both the design and performance spaces of these emerging scheduling strategies is quite large, due in part to the numerous dynamic interactions among the different components of the parallel computing environment as well as the wide range of applications and systems that can comprise the parallel environment. This in turn makes it difficult to fully explore the benefits and limitations of the various proposed dynamic coscheduling approaches for large-scale systems solely with the use of simulation and/or experimentation.To gain a better understanding of the fundamental properties of different dynamic coscheduling methods, we formulate a general mathematical model of this class of scheduling strategies within a unified framework that allows us to investigate a wide range of parallel environments. We derive a matrix-analytic analysis based on a stochastic decomposition and a fixed-point iteration. A large number of numerical experiments are performed in part to examine the accuracy of our approach. These numerical results are in excellent agreement with detailed simulation results. Our mathematical model and analysis is then used to explore several fundamental design and performance tradeoffs associated with the class of dynamic coscheduling policies across a broad spectrum of parallel computing environments.

[1]  Steven Hotovy,et al.  Workload Evolution on the Cornell Theory Center IBM SP2 , 1996, JSSPP.

[2]  Cosimo Anglano A comparative evaluation of implicit coscheduling strategies for networks of workstations , 2000, Proceedings the Ninth International Symposium on High-Performance Distributed Computing.

[3]  Andrea C. Arpaci-Dusseau,et al.  Effective distributed scheduling of parallel workloads , 1996, SIGMETRICS '96.

[4]  Sren Asmussen,et al.  Phase-Type Distributions and Related Point Processes: Fitting and Recent Advances , 1996 .

[5]  Anja Feldmann,et al.  Fitting mixtures of exponentials to long-tail distributions to analyze network performance models , 1997, Proceedings of INFOCOM '97.

[6]  Anja Feldmann,et al.  Fitting Mixtures of Exponentials to Long-Tail Distributions to Analyze Network , 1998, Perform. Evaluation.

[7]  Marcel F. Neuts,et al.  Matrix-geometric solutions in stochastic models - an algorithmic approach , 1982 .

[8]  Andrea C. Arpaci-Dusseau,et al.  Scheduling with implicit information in distributed systems , 1998, SIGMETRICS '98/PERFORMANCE '98.

[9]  Anand Sivasubramaniam,et al.  A simulation-based study of scheduling mechanisms for a dynamic cluster environment , 2000, ICS '00.

[10]  Chita R. Das,et al.  A closer look at coscheduling approaches for a network of workstations , 1999, SPAA '99.

[11]  Bjorn Fredrik Nielsen Modelling long-range dependent and heavy-tailed phenomena by matrix analytic methods , 2000 .

[12]  J. Little A Proof for the Queuing Formula: L = λW , 1961 .

[13]  Patrick Sobalvarro,et al.  Demand-Based Coscheduling of Parallel Jobs on Multiprogrammed Multiprocessors , 1995, JSSPP.

[14]  Dror G. Feitelson,et al.  Job Characteristics of a Production Parallel Scientivic Workload on the NASA Ames iPSC/860 , 1995, JSSPP.

[15]  John K. Ousterhout,et al.  Scheduling Techniques for Concurrent Systems , 1982, ICDCS.

[16]  Charles L. Seitz,et al.  Myrinet: A Gigabit-per-Second Local Area Network , 1995, IEEE Micro.

[17]  Luis G. Vargas Review: Marcel F. Neuts, Matrix-geometric solutions in stochastic models, an algorithmic approach , 1983 .

[18]  Vaidyanathan Ramaswami,et al.  Introduction to Matrix Analytic Methods in Stochastic Modeling , 1999, ASA-SIAM Series on Statistics and Applied Mathematics.

[19]  J D Littler,et al.  A PROOF OF THE QUEUING FORMULA , 1961 .

[20]  A. Horváth,et al.  Approximating heavy tailed behaviour with Phase type distributions , 2000 .

[21]  John K. Ousterhout Scheduling Techniques for Concurrebt Systems. , 1982, ICDCS 1982.

[22]  Marios C. Papaefthymiou,et al.  Stochastic Analysis of Gang Scheduling in Parallel and Distributed Systems , 1996, Perform. Evaluation.

[23]  Mark S. Squillante,et al.  Dynamic Partitioning in Different Distributed-Memory Environments , 1996, JSSPP.

[24]  P. Jacobs,et al.  Finite birth-and-death models in randomly changing environments , 1984, Advances in Applied Probability.