Stability of Redundancy Systems with Processor Sharing

We investigate the stability condition for redundancy-d systems where each of the servers follows a processor-sharing (PS) discipline. We allow for generally distributed job sizes, with possible dependence among the d replica sizes being governed by an arbitrary joint distribution. We establish that the stability condition for the associated fluid-limit model is characterized by the expectation of the minimum of d replica sizes being less than the mean interarrival time per server. In the special case of identical replicas, the stability condition is insensitive to the job size distribution given its mean, and the stability condition is inversely proportional to the number of replicas. In the special case of i.i.d. replicas, the stability threshold decreases (increases) in the number of replicas for job size distributions that are NBU (NWU). We also discuss extensions to scenarios with heterogeneous servers.

[1]  W. Whitt,et al.  Analysis of join-the-shortest-queue routing for web server farms , 2007, Perform. Evaluation.

[2]  G. Dai A Fluid-limit Model Criterion for Instability of Multiclass Queueing Networks , 1996 .

[3]  Regina Robertovna Egorova,et al.  Sojourn time tails in processor-sharing systems , 2009 .

[4]  Alexandre Proutière,et al.  On Stochastic Bounds for Monotonic Processor Sharing Networks , 2004, Queueing Syst. Theory Appl..

[5]  Onno Boxma,et al.  Redundancy scheduling with scaled Bernoulli service requirements , 2019, Queueing Syst. Theory Appl..

[6]  Mor Harchol-Balter,et al.  Reducing Latency via Redundant Requests: Exact Analysis , 2015, SIGMETRICS 2015.

[7]  Philippe Robert,et al.  Fluid Limits for Processor-Sharing Queues with Impatience , 2008, Math. Oper. Res..

[8]  Sean P. Meyn Transience of Multiclass Queueing Networks Via Fluid Limit Models , 1995 .

[9]  Emina Soljanin,et al.  Efficient Redundancy Techniques for Latency Reduction in Cloud Systems , 2015, ACM Trans. Model. Perform. Evaluation Comput. Syst..

[10]  Sem C. Borst,et al.  Delta probing policies for redundancy , 2018, Perform. Evaluation.

[11]  Urtzi Ayesta,et al.  On the Stability of Redundancy Models , 2019, Oper. Res..

[12]  Michel Mandjes,et al.  A stability conjecture on bandwidth sharing networks , 2011, Queueing Syst. Theory Appl..

[13]  Ger Koole,et al.  Resource allocation in grid computing , 2008, J. Sched..

[14]  J. Dai On Positive Harris Recurrence of Multiclass Queueing Networks: A Unified Approach Via Fluid Limit Models , 1995 .

[15]  Brighten Godfrey,et al.  Low latency via redundancy , 2013, CoNEXT.

[16]  Alan Scheller-Wolf,et al.  Redundancy-d: The Power of d Choices for Redundancy , 2017, Oper. Res..

[17]  Bill Ravens,et al.  An Introduction to Copulas , 2000, Technometrics.

[18]  T. Hellemans,et al.  Analysis of Redundancy(d) with Identical Replicas , 2019, PERV.

[19]  Alan Scheller-Wolf,et al.  A Better Model for Job Redundancy: Decoupling Server Slowdown and Job Size , 2016, IEEE/ACM Transactions on Networking.