Self-tuning job scheduling strategies for the resource management of HPC systems and computational grids

In this thesis we develop and study self-tuning job schedulers for resource management systems. Such schedulers search for the best solution among the available scheduling alternatives in order to improve the performance of static schedulers. In two domains of real world job scheduling this concept is implemented. First of all, we study the scheduling in resource management software for high performance computing (HPC) systems. Typically, a single scheduling policy like first come first serve is used, although the characteristics of the submitted jobs permanently change. Using a single scheduling policy might induce a performance loss, as other policies might be more suitable for specific job characteristics. We develop a self-tuning scheduler, which automatically checks all implemented policies and switches to the best one. This improves the performance, in terms of increased utilization and decreased waiting time. Secondly, we develop and study an adaptive scheduler for computational grid environments. In such grids, several geographically distributed HPC machines are joined in order to increase the amount of computational power. Grid jobs might be scheduled across multiple machines, so that the communication among the job parts involves slow wide area networks. This often induces an additional communication overhead, which has to be considered by the grid scheduler. Our adaptive grid scheduler considers the slower communication over wide area networks by extending the execution time of such multi-site jobs. The developed adaptive multi-site grid scheduler automatically checks, which of the two options is more beneficial: waiting for enough resources at a single site, or using multiple sites and the slower wide area network immediately. In both cases we use discrete event simulations for evaluating the performance of the developed schedulers. The results for the self-tuning scheduler show, that an increased utilization of the system and a decreased waiting time for the jobs are possible. We think, that such self-tuning schedulers should be used in modern resource management systems for HPC machines. The evaluation of the grid scheduler shows, that in general a combination of many small machines and multi-site scheduling can not perform as well as a single large machine with the same amount of resource. However, the adaptive multi-site scheduler decreases the performance difference significantly. We think that the participation in computational grid environments is beneficial, as larger problems requiring more computational power can be solved.

[1]  Dror G. Feitelson,et al.  On the definition of "on-line" in job scheduling problems , 2005, SIGA.

[2]  Achim Streit On Job Scheduling for HPC-Clusters and the dynP Scheduler , 2001, HiPC.

[3]  Larry Rudolph,et al.  Parallel Job Scheduling: Issues and Approaches , 1995, JSSPP.

[4]  David A. Lifka,et al.  The ANL/IBM SP Scheduling System , 1995, JSSPP.

[5]  Anand Sivasubramaniam,et al.  An Integrated Approach to Parallel Scheduling Using Gang-Scheduling, Backfilling, and Migration , 2001, JSSPP.

[6]  Kenneth C. Sevcik,et al.  Application Scheduling and Processor Allocation in Multiprogrammed Parallel Processing Systems , 1994, Perform. Evaluation.

[7]  Philip S. Yu,et al.  Scheduling parallelizable tasks to minimize average response time , 1994, SPAA '94.

[8]  Uwe Schwiegelshohn,et al.  Theory and Practice in Parallel Job Scheduling , 1997, JSSPP.

[9]  Axel Keller,et al.  Managing Clusters of Geographically Distributed High-Performance Computers , 1999 .

[10]  Dror G. Feitelson,et al.  The workload on parallel supercomputers: modeling the characteristics of rigid jobs , 2003, J. Parallel Distributed Comput..

[11]  Peter J. Keleher,et al.  Randomization, Speculation, and Adaptation in Batch Schedulers , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[12]  Dror G. Feitelson,et al.  Utilization and Predictability in Scheduling the IBM SP2 with Backfilling , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.

[13]  Dror G. Feitelson,et al.  Improved Utilization and Responsiveness with Gang Scheduling , 1997, JSSPP.

[14]  Thomas R. Gross,et al.  Impact of Job Mix on Optimizations for Space Sharing Schedulers , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[15]  G.E. Moore,et al.  Cramming More Components Onto Integrated Circuits , 1998, Proceedings of the IEEE.

[16]  Axel Keller,et al.  Anatomy of a Resource Management System for HPC-Clusters , 2000 .

[17]  Dror G. Feitelson,et al.  Self-Tuning Systems , 1999, IEEE Softw..

[18]  Giuseppe Serazzi,et al.  A Characterization of the Variation in Time of Workload Arrival Patterns , 1985, IEEE Transactions on Computers.

[19]  Jens Mache,et al.  A Comparative Study of Real Workload Traces and Synthetic Workload Models for Parallel Job Scheduling , 1998, JSSPP.

[20]  Jörn Gehring,et al.  Scheduling a Metacomputer with Uncooperative Sub-schedulers , 1999, JSSPP.

[21]  Steven Hotovy,et al.  Workload Evolution on the Cornell Theory Center IBM SP2 , 1996, JSSPP.

[22]  Wolfgang Ziegler,et al.  Early experiences with the EGrid testbed , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[23]  Uwe Schwiegelshohn,et al.  On the Design and Evaluation of Job Scheduling Algorithms , 1999, JSSPP.

[24]  Axel Keller,et al.  CCS resource management in networked HPC systems , 1998, Proceedings Seventh Heterogeneous Computing Workshop (HCW'98).

[25]  Bill Nitzberg,et al.  A comparison of workload traces from two production parallel machines , 1996, Proceedings of 6th Symposium on the Frontiers of Massively Parallel Computation (Frontiers '96).

[26]  Dror G. Feitelson,et al.  The Forgotten Factor: Facts on Performance Evaluation and Its Dependence on Workloads , 2002, Euro-Par.

[27]  Dror G. Feitelson,et al.  Supporting priorities and improving utilization of the IBM SP scheduler using slack-based backfilling , 1999, Proceedings 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing. IPPS/SPDP 1999.

[28]  Ami Marowka,et al.  The GRID: Blueprint for a New Computing Infrastructure , 2000, Parallel Distributed Comput. Pract..

[29]  Uwe Schwiegelshohn,et al.  Improving First-Come-First-Serve Job Scheduling by Gang Scheduling , 1998, JSSPP.

[30]  Larry Rudolph,et al.  Towards Convergence in Job Schedulers for Parallel Supercomputers , 1996, JSSPP.

[31]  Francine Berman,et al.  A model for moldable supercomputer jobs , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.

[32]  Achim Streit,et al.  Robust resource management for metacomputers , 2000, Proceedings the Ninth International Symposium on High-Performance Distributed Computing.

[33]  Honbo Zhou,et al.  The EASY - LoadLeveler API Project , 1996, JSSPP.

[34]  Uwe Schwiegelshohn,et al.  Fairness in parallel job scheduling , 2000 .

[35]  Larry Rudolph,et al.  Metrics and Benchmarking for Parallel Job Scheduling , 1998, JSSPP.

[36]  Axel Keller,et al.  Resource Management for High_performance PC Clusters , 1999, HPCN Europe.

[37]  Jörn Gehring,et al.  Architecture-Independent Request-Scheduling with Tight Waiting-Time Estimations , 1996, JSSPP.

[38]  Ramin Yahyapour,et al.  Design and evaluation of job scheduling strategies for grid computing , 2000, GRID.

[39]  M. F.,et al.  Bibliography , 1985, Experimental Gerontology.

[40]  P. E. Wellstead,et al.  Self-tuning systems , 1991 .

[41]  Uwe Schwiegelshohn,et al.  On Advantages of Grid Computing for Parallel Job Scheduling , 2002, 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID'02).

[42]  Dror G. Feitelson,et al.  Metrics for Parallel Job Scheduling and Their Convergence , 2001, JSSPP.

[43]  D. Atkin OR scheduling algorithms. , 2000, Anesthesiology.

[44]  Dmitry N. Zotkin,et al.  Attacking the bottlenecks of backfilling schedulers , 2004, Cluster Computing.

[45]  Kurt Kremer,et al.  Scheduling a metacomputer by an implicit voting system , 1994, Proceedings of 3rd IEEE International Symposium on High Performance Distributed Computing.

[46]  Ian Foster,et al.  The Grid 2 - Blueprint for a New Computing Infrastructure, Second Edition , 1998, The Grid 2, 2nd Edition.

[47]  van den Jm Marjan Akker,et al.  A time-indexed formulation for single-machine scheduling problems : column generation , 1996 .

[48]  Warren Smith,et al.  Benchmarks and Standards for the Evaluation of Parallel Job Schedulers , 1999, JSSPP.

[49]  Jarek Nabrzyski,et al.  Grid resource management: state of the art and future trends , 2004 .

[50]  Uwe Schwiegelshohn Preemptive Weighted Completion Time Scheduling of Parallel Jobs , 1996, ESA.

[51]  Richard Gibbons,et al.  A Historical Application Profiler for Use by Parallel Schedulers , 1997, JSSPP.

[52]  Federico Ruggieri The Datagrid Project , 2001 .

[53]  Dror G. Feitelson,et al.  Job Characteristics of a Production Parallel Scientivic Workload on the NASA Ames iPSC/860 , 1995, JSSPP.

[54]  Michael Pinedo,et al.  Scheduling: Theory, Algorithms, and Systems , 1994 .

[55]  Ramin Yahyapour,et al.  On Effects of Machine Configurations on Parallel Job Scheduling in Computational Grids , 2002 .

[56]  Axel Keller,et al.  Managing clusters of geographically distributed high-performance computers , 1999, Concurr. Pract. Exp..

[57]  E.L. Lawler,et al.  Optimization and Approximation in Deterministic Sequencing and Scheduling: a Survey , 1977 .

[58]  Thu D. Nguyen,et al.  Maximizing speedup through self-tuning of processor allocation , 1996, Proceedings of International Conference on Parallel Processing.

[59]  Anand Sivasubramaniam,et al.  Improving parallel job scheduling by combining gang scheduling and backfilling techniques , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.

[60]  Mark J. Clement,et al.  Core Algorithms of the Maui Scheduler , 2001, JSSPP.

[61]  Allen B. Downey,et al.  A parallel workload model and its implications for processor allocation , 1996, Proceedings. The Sixth IEEE International Symposium on High Performance Distributed Computing (Cat. No.97TB100183).

[62]  David B. Shmoys,et al.  Scheduling to minimize average completion time: off-line and on-line algorithms , 1996, SODA '96.

[63]  Dror G. Feitelson,et al.  Packing Schemes for Gang Scheduling , 1996, JSSPP.

[64]  Achim Streit The self-tuning dynP job-scheduler , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[65]  Achim Streit,et al.  Enhanced Algorithms for Multi-site Scheduling , 2002, GRID.

[66]  Allen B. Downey,et al.  The elusive goal of workload characterization , 1999, PERV.

[67]  Fang Wang,et al.  Modeling of Workload in MPPs , 1997, JSSPP.

[68]  Dror G. Feitelson,et al.  Utilization, Predictability, Workloads, and User Runtime Estimates in Scheduling the IBM SP2 with Backfilling , 2001, IEEE Trans. Parallel Distributed Syst..

[69]  Teunis J. Ott,et al.  Load-balancing heuristics and process behavior , 1986, SIGMETRICS '86/PERFORMANCE '86.

[70]  Uwe Schwiegelshohn,et al.  Analysis of first-come-first-serve parallel job scheduling , 1998, SODA '98.

[71]  Warren Smith,et al.  A Resource Management Architecture for Metacomputing Systems , 1998, JSSPP.

[72]  Achim Streit A Self-Tuning Job Scheduler Family with Dynamic Policy Switching , 2002, JSSPP.

[73]  Achim Streit,et al.  Scheduling in HPC Resource Management Systems: Queuing vs. Planning , 2003, JSSPP.

[74]  Dror G. Feitelson Analyzing the Root Causes of Performance Evaluation Results , 2002 .