Comparative Performance Analysis of Job Scheduling Algorithms in a Real-World Scientific Application

In High Performance Computing, it is common to deal with substantial computing resources, and the use of a Resource Management System (RMS) becomes fundamental. The job scheduling algorithm is a key part of a RMS, and the selection of the best job scheduling that meets the user needs is of most relevance. In this work, we use a real-world scientific application to evaluate the performance of 4 different job scheduling algorithms: First in, first out (FIFO), Shortest Job First (SJF), EASY-backfilling and Fattened-backfilling. These algorithms worked with RMS SLURM workload manager, considering a scientific application that predicts the earth’s ionosphere dynamics. In the results we highlight each algorithm’s strength and weakness for different scenarios that change the possibility of advancing smaller jobs. To deepen our analysis, we also compared the job scheduling algorithms using 4 jobs of Numerical Aerodynamic Sampling (NAS) Parallel Benchmarks in a controlled scenario.

[1]  Andrzej M. Goscinski,et al.  The Impact of Under-Estimated Length of Jobs on EASY-Backfill Scheduling , 2008, 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008).

[2]  Bernd Freisleben,et al.  A comparative study of online scheduling algorithms for networks of workstations , 2000, Cluster Computing.

[3]  Emiliano Casalicchio,et al.  Measuring Docker Performance: What a Mess!!! , 2017, ICPE Companion.

[4]  Dror G. Feitelson,et al.  Supporting priorities and improving utilization of the IBM SP scheduler using slack-based backfilling , 1999, Proceedings 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing. IPPS/SPDP 1999.

[5]  G. J. Bailey,et al.  First results of operational ionospheric dynamics prediction for the Brazilian Space Weather program , 2014 .

[6]  Andy B. Yoo,et al.  Approved for Public Release; Further Dissemination Unlimited X-ray Pulse Compression Using Strained Crystals X-ray Pulse Compression Using Strained Crystals , 2002 .

[7]  Sangsuree Vasupongayya,et al.  On Job Fairness in Non-Preemptive Parallel Job Scheduling , 2005, IASTED PDCS.

[8]  Geppino Pucci,et al.  Universality in VLSI Computation , 2011, ParCo 2011.

[9]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[10]  Miguel A. Vega-Rodríguez,et al.  Fattened backfilling: An improved strategy for job scheduling in parallel systems , 2016, J. Parallel Distributed Comput..

[11]  Uwe Schwiegelshohn,et al.  Theory and Practice in Parallel Job Scheduling , 1997, JSSPP.

[12]  Honbo Zhou,et al.  The EASY - LoadLeveler API Project , 1996, JSSPP.

[13]  Dan Tsafrir,et al.  Backfilling Using System-Generated Predictions Rather than User Runtime Estimates , 2007, IEEE Transactions on Parallel and Distributed Systems.

[14]  T. Subbulakshmi,et al.  A comparison study and performance evaluation of schedulers in Hadoop YARN , 2017, 2017 2nd International Conference on Communication and Electronics Systems (ICCES).

[15]  Richard Gibbons,et al.  A Historical Application Profiler for Use by Parallel Schedulers , 1997, JSSPP.

[16]  Rex K. Kincaid,et al.  A look-ahead heuristic for scheduling jobs with release dates on a single machine , 1994, Comput. Oper. Res..

[17]  Dror G. Feitelson,et al.  Utilization, Predictability, Workloads, and User Runtime Estimates in Scheduling the IBM SP2 with Backfilling , 2001, IEEE Trans. Parallel Distributed Syst..

[18]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[19]  Krzysztof Rzadca,et al.  A Scheduler-Level Incentive Mechanism for Energy Efficiency in HPC , 2015, 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[20]  Denis Trystram,et al.  Adaptive Resource and Job Management for Limited Power Consumption , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop.