On Interactions among Scheduling Policies: Finding Efficient Queue Setup Using High-Resolution Simulations

Many studies in the past two decades focused on the problem of efficient job scheduling in HPC and Grid-like systems. While many new scheduling algorithms have been proposed for systems with specific requirements, mainstream resource management systems and schedulers are still only using a limited set of scheduling policies. Production systems need to balance various policies that are set in place to satisfy both the resource providers and users (or virtual organizations) in the system. While many works address these separate policies, e.g., fairshare for fair resource allocation, only few works try to address the interactions between these separate solutions. In this paper we describe how to approach these interactions when developing site-specific policies. Notably, we describe how (priority) queues interact with scheduling algorithms, fairshare and with anti-starvation mechanisms. Moreover, we present a case study describing how an advanced simulation tool was used to find new configuration for an actual resource manager deployed in the Czech National Grid, significantly increasing its performance.

[1]  Mark J. Clement,et al.  Core Algorithms of the Maui Scheduler , 2001, JSSPP.

[2]  Dalibor Klusácek,et al.  Multi-resource Aware Fairsharing for Heterogeneous Systems , 2014, JSSPP.

[3]  Scott H. Clearwater,et al.  Fair share on high performance computing systems: what does fair really mean? , 2003, CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings..

[4]  Dalibor Klusácek,et al.  Alea 2: job scheduling simulator , 2010, SimuTools.

[5]  Mung Chiang,et al.  Multiresource Allocation: Fairness–Efficiency Tradeoffs in a Unifying Framework , 2012, IEEE/ACM Transactions on Networking.

[6]  Dror G. Feitelson,et al.  Utilization, Predictability, Workloads, and User Runtime Estimates in Scheduling the IBM SP2 with Backfilling , 2001, IEEE Trans. Parallel Distributed Syst..

[7]  Dror G. Feitelson,et al.  Pitfalls in Parallel Job Scheduling Evaluation , 2005, JSSPP.

[8]  Benjamin Hindman,et al.  Dominant Resource Fairness: Fair Allocation of Multiple Resource Types , 2011, NSDI.

[9]  Uwe Schwiegelshohn,et al.  How to Design a Job Scheduling Algorithm , 2014, JSSPP.

[10]  David A. Lifka,et al.  The ANL/IBM SP Scheduling System , 1995, JSSPP.

[11]  Ramin Yahyapour,et al.  Benefits of global grid computing for job scheduling , 2004, Fifth IEEE/ACM International Workshop on Grid Computing.

[12]  Adam Wierman,et al.  Classifying scheduling policies with respect to unfairness in an M/GI/1 , 2003, SIGMETRICS '03.

[13]  Dalibor Klusácek,et al.  Performance and Fairness for Users in Parallel Job Scheduling , 2012, JSSPP.