Reducing tail latency using duplication: a multi-layered approach

Duplication can be a powerful strategy for overcoming stragglers in cloud services, but is often used conservatively because of the risk of overloading the system. We call for making duplication a first-class concept in cloud systems, and make two contributions in this regard. First, we present duplicate-aware scheduling or DAS, an aggressive duplication policy that duplicates every job, but keeps the system safe by providing suitable support (prioritization and purging) at multiple layers of the cloud system. Second, we present the D-Stage abstraction, which supports DAS and other duplication policies across diverse layers of a cloud system (e.g., network, storage, etc.). The D-Stage abstraction decouples the duplication policy from the mechanism, and facilitates working with legacy layers of a system. Using this abstraction, we evaluate the benefits of DAS for two data parallel applications (HDFS, an in-memory workload generator) and a network function (Snort-based IDS cluster). Our experiments on the public cloud and Emulab show that DAS is safe to use, and the tail latency improvement holds across a wide range of workloads.

[1]  Tanakorn Leesatapornwongsa,et al.  What Bugs Live in the Cloud? A Study of 3000+ Issues in Cloud Systems , 2014, SoCC.

[2]  Albert G. Greenberg,et al.  Reining in the Outliers in Map-Reduce Clusters using Mantri , 2010, OSDI.

[3]  Alex X. Liu,et al.  Friends, not Foes – Synthesizing Existing Transport Strategies for Data Center Networks , 2014 .

[4]  Dan Feng,et al.  CDRM: A Cost-Effective Dynamic Replication Management Scheme for Cloud Storage Cluster , 2010, 2010 IEEE International Conference on Cluster Computing.

[5]  Scott Shenker,et al.  Usenix Association 10th Usenix Symposium on Networked Systems Design and Implementation (nsdi '13) 185 Effective Straggler Mitigation: Attack of the Clones , 2022 .

[6]  Zhe Wu,et al.  CosTLO: Cost-Effective Redundancy for Lower Latency Variance on Cloud Storage Services , 2015, NSDI.

[7]  T. N. Vijaykumar,et al.  Deadline-aware datacenter tcp (D2TCP) , 2012, SIGCOMM '12.

[8]  Anja Feldmann,et al.  C3: Cutting Tail Latency in Cloud Data Stores via Adaptive Replica Selection , 2015, NSDI.

[9]  Hari Balakrishnan,et al.  Restructuring endpoint congestion control , 2018, SIGCOMM.

[10]  Amer Diwan,et al.  Performance Analysis of Cloud Applications , 2018, NSDI.

[11]  Christopher Stewart,et al.  Zoolander: Efficiently Meeting Very Strict, Low-Latency SLOs , 2013, ICAC.

[12]  Bo Fu,et al.  PBSE: a robust path-based speculative execution for degraded-network tail tolerance in data-parallel frameworks , 2017, SoCC.

[13]  Irfan Ahmad,et al.  PARDA: Proportional Allocation of Resources for Distributed Storage Access , 2009, FAST.

[14]  Nick McKeown,et al.  pFabric: minimal near-optimal datacenter transport , 2013, SIGCOMM.

[15]  Adam Wierman,et al.  How to Determine a Good Multi-Programming Level for External Scheduling , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[16]  Luigi Rizzo,et al.  netmap: A Novel Framework for Fast Packet I/O , 2012, USENIX ATC.

[17]  Peter Steenkiste,et al.  Architecting for edge diversity: supporting rich services over an unbundled transport , 2012, CoNEXT '12.

[18]  Andrea C. Arpaci-Dusseau,et al.  Analysis of HDFS under HBase: a facebook messages case study , 2014, FAST.

[19]  Adam Wierman,et al.  This Paper Is Included in the Proceedings of the 11th Usenix Symposium on Networked Systems Design and Implementation (nsdi '14). Grass: Trimming Stragglers in Approximation Analytics Grass: Trimming Stragglers in Approximation Analytics , 2022 .

[20]  Robert B. Ross,et al.  Fail-Slow at Scale , 2018, ACM Trans. Storage.

[21]  Albert G. Greenberg,et al.  VL2: a scalable and flexible data center network , 2009, SIGCOMM '09.

[22]  David E. Culler,et al.  SEDA: an architecture for well-conditioned, scalable internet services , 2001, SOSP.

[23]  Brighten Godfrey,et al.  Low latency via redundancy , 2013, CoNEXT.

[24]  Dan Pei,et al.  Fast and Cautious: Leveraging Multi-path Diversity for Transport Loss Recovery in Data Centers , 2016, USENIX Annual Technical Conference.

[25]  Libin Liu,et al.  RepNet: Cutting Latency with Flow Replication in Data Center Networks , 2018, IEEE Transactions on Services Computing.

[26]  Mor Harchol-Balter,et al.  Scheduling for efficiency and fairness in systems with redundancy , 2017, Perform. Evaluation.

[27]  John W. Byers,et al.  Judicious QoS using cloud overlays , 2019, CoNEXT.

[28]  Adam Wierman,et al.  Hopper: Decentralized Speculation-aware Cluster Scheduling at Scale , 2015, SIGCOMM.

[29]  Srikanth Kandula,et al.  Leveraging endpoint flexibility in data-intensive clusters , 2013, SIGCOMM.

[30]  Albert G. Greenberg,et al.  Scarlett: coping with skewed content popularity in mapreduce clusters , 2011, EuroSys '11.

[31]  Bianca Schroeder,et al.  sRoute: Treating the Storage Stack Like a Network , 2016, FAST.

[32]  Antony I. T. Rowstron,et al.  Decentralized task-aware scheduling for data center networks , 2014, SIGCOMM.

[33]  Haitao Wu,et al.  Enabling ECN in Multi-Service Multi-Queue Data Centers , 2016, NSDI.

[34]  Albert G. Greenberg,et al.  Data center TCP (DCTCP) , 2010, SIGCOMM '10.

[35]  Srinivasan Seshan,et al.  XIA: Efficient Support for Evolvable Internetworking , 2012, NSDI.

[36]  Rodrigo Fonseca,et al.  Retro: Targeted Resource Management in Multi-tenant Distributed Systems , 2015, NSDI.

[37]  Brian D. Noble,et al.  Bobtail: Avoiding Long Tails in the Cloud , 2013, NSDI.

[38]  Patrick Wendell,et al.  Sparrow: distributed, low latency scheduling , 2013, SOSP.

[39]  Hitesh Ballani,et al.  End-to-end Performance Isolation Through Virtual Datacenters , 2014, OSDI.

[40]  Michael I. Jordan,et al.  The SCADS Director: Scaling a Distributed Storage System Under Stringent Performance Requirements , 2011, FAST.

[41]  Rodrigo Fonseca,et al.  Pivot tracing , 2018, USENIX ATC.

[42]  Zartash Afzal Uzmi,et al.  Workload adaptive flow scheduling , 2018, CoNEXT.

[43]  Ning Zhang,et al.  ERMS: An Elastic Replication Management System for HDFS , 2012, 2012 IEEE International Conference on Cluster Computing Workshops.

[44]  Fahad R. Dogar,et al.  Leveraging the Power of Cloud for Reliable Wide Area Communication , 2015, HotNets.

[45]  Carlos Maltzahn,et al.  Malacology: A Programmable Storage System , 2017, EuroSys.

[46]  Baochun Li,et al.  RepFlow: Minimizing flow completion times with replicated flows in data centers , 2013, IEEE INFOCOM 2014 - IEEE Conference on Computer Communications.

[47]  Antony I. T. Rowstron,et al.  IOFlow: a software-defined storage architecture , 2013, SOSP.

[48]  Fahad R. Dogar,et al.  Measuring and Improving the Reliability of Wide-Area Cloud Paths , 2017, WWW.

[49]  Rodrigo Fonseca,et al.  Principled workflow-centric tracing of distributed systems , 2016, SoCC.

[50]  Ihsan Ayyub Qazi,et al.  Towards a Redundancy-Aware Network Stack for Data Centers , 2016, HotNets.

[51]  Donald Beaver,et al.  Dapper, a Large-Scale Distributed Systems Tracing Infrastructure , 2010 .

[52]  Randy H. Katz,et al.  Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.

[53]  Cristina L. Abad,et al.  DARE: Adaptive Data Replication for Efficient Cluster Scheduling , 2011, 2011 IEEE International Conference on Cluster Computing.

[54]  Luiz André Barroso,et al.  The tail at scale , 2013, CACM.

[55]  Haryadi S. Gunawi,et al.  Why Does the Cloud Stop Computing?: Lessons from Hundreds of Service Outages , 2016, SoCC.

[56]  Yongqiang Xiong,et al.  ClickNP: Highly Flexible and High Performance Network Processing with Reconfigurable Hardware , 2016, SIGCOMM.

[57]  Wei Bai,et al.  Information-Agnostic Flow Scheduling for Commodity Data Centers , 2015, NSDI.

[58]  Ihsan Ayyub Qazi,et al.  Load balancing over symmetric virtual topologies , 2017, IEEE INFOCOM 2017 - IEEE Conference on Computer Communications.

[59]  Kannan Ramchandran,et al.  A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers , 2014 .

[60]  Eunyoung Jeong,et al.  mTCP: a Highly Scalable User-level TCP Stack for Multicore Systems , 2014, NSDI.

[61]  Andrew A. Chien,et al.  MittOS: Supporting Millisecond Tail Tolerance with Fast Rejecting SLO-Aware OS Interface , 2017, SOSP.

[62]  George Parisis,et al.  Trevi: watering down storage hotspots with cool fountain codes , 2013, HotNets.

[63]  George Varghese,et al.  P4: programming protocol-independent packet processors , 2013, CCRV.