Efficient Redundancy Techniques for Latency Reduction in Cloud Systems

In cloud computing systems, assigning a task to multiple servers and waiting for the earliest copy to finish is an effective method to combat the variability in response time of individual servers and reduce latency. But adding redundancy may result in higher cost of computing resources, as well as an increase in queueing delay due to higher traffic load. This work helps in understanding when and how redundancy gives a cost-efficient reduction in latency. For a general task service time distribution, we compare different redundancy strategies in terms of the number of redundant tasks and the time when they are issued and canceled. We get the insight that the log-concavity of the task service time creates a dichotomy of when adding redundancy helps. If the service time distribution is log-convex (i.e., log of the tail probability is convex), then adding maximum redundancy reduces both latency and cost. And if it is log-concave (i.e., log of the tail probability is concave), then less redundancy, and early cancellation of redundant tasks is more effective. Using these insights, we design a general redundancy strategy that achieves a good latency-cost trade-off for an arbitrary service time distribution. This work also generalizes and extends some results in the analysis of fork-join queues.

[1]  Mor Harchol-Balter,et al.  Reducing Latency via Redundant Requests: Exact Analysis , 2015, SIGMETRICS 2015.

[2]  R. Wolff,et al.  Job replication on multiserver systems , 2009, Advances in Applied Probability.

[3]  Scott Shenker,et al.  Usenix Association 10th Usenix Symposium on Networked Systems Design and Implementation (nsdi '13) 185 Effective Straggler Mitigation: Attack of the Clones , 2022 .

[4]  Kannan Ramchandran,et al.  The MDS queue: Analysing the latency performance of erasure codes , 2012, 2014 IEEE International Symposium on Information Theory.

[5]  Vaneet Aggarwal,et al.  Joint latency and cost optimization for erasurecoded data center storage , 2014, PERV.

[6]  Ulas C. Kozat,et al.  TOFEC: Achieving optimal throughput-delay trade-off of cloud storage using erasure codes , 2014, IEEE INFOCOM 2014 - IEEE Conference on Computer Communications.

[7]  Dispersity Routing,et al.  Dispersity Routing , .

[8]  Emina Soljanin,et al.  On the Delay-Storage Trade-Off in Content Download from Coded Distributed Storage Systems , 2013, IEEE Journal on Selected Areas in Communications.

[9]  R. Gallager Stochastic Processes , 2014 .

[10]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[11]  G. Kabatiansky,et al.  Coding of Messages at the Transport Layer of the Data Network , 2005 .

[12]  L. Flatto,et al.  Two parallel queues created by arrivals with two demands. II , 1984 .

[13]  Gregory W. Wornell,et al.  Using Straggler Replication to Reduce Latency in Large-scale Parallel Computing , 2015, PERV.

[14]  Gregory W. Wornell,et al.  Efficient task replication for fast response times in parallel computation , 2014, SIGMETRICS '14.

[15]  D. Walkup,et al.  Association of Random Variables, with Applications , 1967 .

[16]  Emina Soljanin,et al.  Queues with Redundancy: Latency-Cost Analysis , 2015, PERV.

[17]  M. Bagnoli,et al.  Log-concave probability and its applications , 2004 .

[18]  Michael Mitzenmacher,et al.  The Power of Two Choices in Randomized Load Balancing , 2001, IEEE Trans. Parallel Distributed Syst..

[19]  Emina Soljanin,et al.  Coding for fast content download , 2012, 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[20]  A. M. Lee,et al.  Queueing Processes Associated with Airline Passenger Check-in , 1959 .

[21]  E. Krouk,et al.  Error Correcting Coding and Security for Data Networks: Analysis of the Superchannel Concept , 2007 .

[22]  Patrick Wendell,et al.  Sparrow: distributed, low latency scheduling , 2013, SOSP.

[23]  Ger Koole,et al.  Resource allocation in grid computing , 2008, J. Sched..

[24]  Luiz André Barroso,et al.  The tail at scale , 2013, CACM.

[25]  Brighten Godfrey,et al.  Low latency via redundancy , 2013, CoNEXT.

[26]  Emina Soljanin,et al.  Efficient replication of queued tasks for latency reduction in cloud systems , 2015, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[27]  Emina Soljanin,et al.  Analyzing the download time of availability codes , 2015, 2015 IEEE International Symposium on Information Theory (ISIT).

[28]  Mor Harchol-Balter,et al.  Performance Modeling and Design of Computer Systems: Queueing Theory in Action , 2013 .

[29]  Ness B. Shroff,et al.  When queueing meets coding: Optimal-latency data retrieving scheme in storage clouds , 2014, IEEE INFOCOM 2014 - IEEE Conference on Computer Communications.

[30]  T. Charles Clancy,et al.  On the latency of heterogeneous MDS queue , 2014, 2014 IEEE Global Communications Conference.

[31]  Gideon Weiss,et al.  A product form solution to a system with multi-type jobs and multi-type servers , 2012, Queueing Syst. Theory Appl..

[32]  JoshiGauri,et al.  Efficient Redundancy Techniques for Latency Reduction in Cloud Systems , 2017 .

[33]  Yuedong Wang,et al.  The NBUC and NWUC classes of life distributions , 1991 .

[34]  L. Flatto,et al.  Erratum: Two Parallel Queues Created by Arrivals with Two Demands I , 1985 .

[35]  Ness B. Shroff,et al.  Provably delay efficient data retrieving in storage clouds , 2015, 2015 IEEE Conference on Computer Communications (INFOCOM).

[36]  Asser N. Tantawi,et al.  Approximate Analysis of Fork/Join Synchronization in Parallel Queues , 1988, IEEE Trans. Computers.