Stability for Two-class Multiserver-job Systems

Multiserver-job systems, where jobs require concurrent service at many servers, occur widely in practice. Much is known in the dropping setting, where jobs are immediately discarded if they require more servers than are currently available. However, very little is known in the more practical setting where jobs queue instead. In this paper, we derive a closed-form analytical expression for the stability region of a two-class (non-dropping) multiserver-job system where each class of jobs requires a distinct number of servers and requires a distinct exponential distribution of service time, and jobs are served in first-come-first-served (FCFS) order. This is the first result of any kind for an FCFS multiserver-job system where the classes have distinct service distributions. Our work is based on a technique that leverages the idea of a "saturated" system, in which an unlimited number of jobs are always available. Our analytical formula provides insight into the behavior of FCFS multiserver-job systems, highlighting the huge wastage (idle servers while jobs are in the queue) that can occur, as well as the nonmonotonic effects of the service rates on wastage.

[1]  Mohammed Joda Usman,et al.  Performance comparison of heuristic algorithms for task scheduling in IaaS cloud computing environment , 2017, PloS one.

[2]  Alexander S. Rumyantsev,et al.  Stability criterion of a multiserver model with simultaneous service , 2017, Ann. Oper. Res..

[3]  Leszek Sliwko,et al.  A Taxonomy of Schedulers – Operating Systems, Clusters and Big Data Frameworks , 2019, Global Journal of Computer Science and Technology.

[4]  R. Srikant,et al.  Scheduling Jobs With Unknown Duration in Clouds , 2013, IEEE/ACM Transactions on Networking.

[5]  Oleg M. Tikhonenko,et al.  Generalized Erlang Problem for Service Systems with Finite Total Capacity , 2005, Probl. Inf. Transm..

[6]  R. Srikant,et al.  Stochastic models of load balancing and scheduling in cloud computing clusters , 2012, 2012 Proceedings IEEE INFOCOM.

[7]  Zhiling Lan,et al.  Adaptive Metric-Aware Job Scheduling for Production Supercomputers , 2012, 2012 41st International Conference on Parallel Processing Workshops.

[8]  James Patton Jones,et al.  Scheduling for Parallel Supercomputing: A Historical Perspective of Achievable Utilization , 1999, JSSPP.

[9]  Daniel S. Katz,et al.  Scheduling many-task workloads on supercomputers: Dealing with trailing tasks , 2010, 2010 3rd Workshop on Many-Task Computing on Grids and Supercomputers.

[10]  Nico M. van Dijk,et al.  Blocking of finite source inputs which require simultaneous servers with general think and holding times , 1989 .

[11]  Dan Tsafrir,et al.  A Short Survey of Commercial Cluster Batch Schedulers , 2005 .

[12]  Zhiling Lan,et al.  Reducing Fragmentation on Torus-Connected Supercomputers , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[13]  Alexander S. Rumyantsev,et al.  Stability Analysis of a MAP/M/s Cluster Model by Matrix-Analytic Method , 2016, EPEW.

[14]  Mor Harchol-Balter,et al.  Borg: the next generation , 2020, EuroSys.

[15]  Percy H. Brill,et al.  Queues in Which Customers Receive Simultaneous Service from a Random Number of Servers: A System Point Approach , 1984 .

[16]  Yang Cao,et al.  Comparison of Job Scheduling Policies in Cloud Computing , 2013 .

[17]  Javad Ghaderi,et al.  On Non-Preemptive VM Scheduling in the Cloud , 2017, Abstracts of the 2018 ACM International Conference on Measurement and Modeling of Computer Systems.

[18]  Quansheng Guan,et al.  Optimal Scheduling of VMs in Queueing Cloud Computing Systems With a Heterogeneous Workload , 2018, IEEE Access.

[19]  F. G. Foster On the Stochastic Matrices Associated with Certain Queuing Processes , 1953 .

[20]  J. S. Kaufman,et al.  Sizing a Message Store Subject to Blocking Criteria , 1979, Performance.

[21]  Javad Ghaderi,et al.  Randomized algorithms for scheduling VMs in the cloud , 2016, IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications.

[22]  L. G. Afanas'eva,et al.  Stability Analysis of a Multi-server Model with Simultaneous Service and a Regenerative Input Flow , 2019, Methodology and Computing in Applied Probability.

[23]  Helen D. Karatza,et al.  An M/M/2 parallel system model with pure space sharing among rigid jobs , 2007, Math. Comput. Model..

[24]  Uwe Schwiegelshohn,et al.  Parallel Job Scheduling - A Status Report , 2004, JSSPP.

[25]  Jan Weglarz,et al.  Hierarchical scheduling strategies for parallel tasks and advance reservations in grids , 2013, J. Sched..

[26]  W. Whitt,et al.  Blocking when service is required from several facilities simultaneously , 1985, AT&T Technical Journal.

[27]  Larry Rudolph,et al.  Towards Convergence in Job Schedulers for Parallel Supercomputers , 1996, JSSPP.