论文信息 - Achievable Stability in Redundancy Systems

Achievable Stability in Redundancy Systems

We consider a system with N~parallel servers where incoming jobs are immediately replicated to, say, d~servers. Each of the N servers has its own queue and follows a FCFS discipline. As soon as the first job replica is completed, the remaining replicas are abandoned. We investigate the achievable stability region for a quite general workload model with different job types and heterogeneous servers, reflecting job-server affinity relations which may arise from data locality issues and soft compatibility constraints. Under the assumption that job types are known beforehand we show for New-Better-than-Used (NBU) distributed speed variations that no replication $(d=1)$ gives a strictly larger stability region than replication $(d>1)$. Strikingly, this does not depend on the underlying distribution of the intrinsic job sizes, but observing the job types is essential for this statement to hold. In case of non-observable job types we show that for New-Worse-than-Used (NWU) distributed speed variations full replication ($d=N$) gives a larger stability region than no replication $(d=1)$.

Sem Borst | Youri Raaijmakers

[1] J. Michael Harrison,et al. Heavy traffic resource pooling in parallel‐server systems , 1999, Queueing Syst. Theory Appl..

[2] Gretchen L. Matthews,et al. On the service capacity region of accessing erasure coded content , 2017, 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[3] Ward Whitt,et al. Comparison methods for queues and other stochastic models , 1986 .

[4] Emina Soljanin,et al. On the Delay-Storage Trade-Off in Content Download from Coded Distributed Storage Systems , 2013, IEEE Journal on Selected Areas in Communications.

[5] Gal Mendelson. A Lower Bound on the stability region of Redundancy-d with FIFO service discipline , 2021, Oper. Res. Lett..

[6] Bilal Zia,et al. The Abcs of Financial Education: Experimental Evidence on Attitudes, Behavior, and Cognitive Biases , 2015, Manag. Sci..

[7] Gauri Joshi,et al. Synergy via Redundancy: Boosting Service Capacity with Adaptive Replication , 2018, PERV.

[8] Benny Van Houdt,et al. Performance of Redundancy(d) with Identical/Independent Replicas , 2019, ACM Trans. Model. Perform. Evaluation Comput. Syst..

[9] Alexander L. Stolyar,et al. OPTIMAL ROUTING IN OUTPUT-QUEUED FLEXIBLE SERVER SYSTEMS , 2005, Probability in the Engineering and Informational Sciences.

[10] Benny Van Houdt,et al. Performance Analysis of Workload Dependent Load Balancing Policies , 2019, Abstracts of the 2019 SIGMETRICS/Performance Joint International Conference on Measurement and Modeling of Computer Systems.

[11] Alan Scheller-Wolf,et al. Redundancy-d: The Power of d Choices for Redundancy , 2017, Oper. Res..

[12] Gretchen L. Matthews,et al. Service Rate Region of Content Access from Erasure Coded Storage , 2018, 2018 IEEE Information Theory Workshop (ITW).

[13] Gauri Joshi,et al. Efficient redundancy techniques to reduce delay in Cloud systems , 2016 .

[14] Mihalis G. Markakis,et al. Learning and Hierarchies in Service Systems , 2019, Manag. Sci..

[15] Sem C. Borst,et al. Delta probing policies for redundancy , 2018, Perform. Evaluation.

[16] N. L. Lawrie,et al. Comparison Methods for Queues and Other Stochastic Models , 1984 .

[17] R. Wolff,et al. Job replication on multiserver systems , 2009, Advances in Applied Probability.

[18] Onno Boxma,et al. Redundancy scheduling with scaled Bernoulli service requirements , 2019, Queueing Syst. Theory Appl..

[19] Ness B. Shroff,et al. On Delay-Optimal Scheduling in Queueing Systems with Replications , 2016, ArXiv.

[20] Ger Koole,et al. Resource allocation in grid computing , 2008, J. Sched..

[21] Onno Boxma,et al. Stability of Redundancy Systems with Processor Sharing , 2020, VALUETOOLS.

[22] Gregory W. Wornell,et al. Efficient Straggler Replication in Large-Scale Parallel Computing , 2015, ACM Trans. Model. Perform. Evaluation Comput. Syst..

[23] Fatemeh Kazemi,et al. Service Rate Region: A New Aspect of Coded Distributed System Design , 2020, ArXiv.

[24] Felix Poloczek,et al. Contrasting Effects of Replication in Parallel Systems: From Overload to Underload and Back , 2016, SIGMETRICS.

[25] Alan Scheller-Wolf,et al. A Better Model for Job Redundancy: Decoupling Server Slowdown and Job Size , 2016, IEEE/ACM Transactions on Networking.

[26] Urtzi Ayesta,et al. Improving the Performance of Heterogeneous Data Centers through Redundancy , 2020, Proc. ACM Meas. Anal. Comput. Syst..