Aequitas: admission control for performance-critical RPCs in datacenters

With the increasing popularity of disaggregated storage and microservice architectures, high fan-out and fan-in Remote Procedure Calls (RPCs) now generate most of the traffic in modern datacenters. While the network plays a crucial role in RPC performance, traditional traffic classification categories cannot sufficiently capture their importance due to wide variations in RPC characteristics. As a result, meeting service-level objectives (SLOs), especially for performance-critical (PC) RPCs, remains challenging. We present Aequitas, a distributed sender-driven admission control scheme that uses commodity Weighted-Fair Queuing (WFQ) to guarantee RPC-level SLOs. In the presence of network overloads, it enforces cluster-wide RPC latency SLOs by limiting the amount of traffic admitted into any given QoS and downgrading the rest. We show analytically and empirically that this simple scheme works well. When the network demand spikes beyond provisioned capacity, Aequitas achieves a latency SLO that is 3.8× lower than the state-of-art congestion control at the 99.9th-p and admits up to 2× more PC RPCs meeting SLO when compared with pFabric, Qjump, D3, PDQ, and Homa. Results in our fleetwide production deployment show a 10% latency improvement.

[1]  Gautam Kumar,et al.  Swift: Delay is Simple and Effective for Congestion Control in the Datacenter , 2020, SIGCOMM.

[2]  M. Alizadeh,et al.  Overload Control for µs-scale RPCs with Breakwater , 2020, OSDI.

[3]  Minlan Yu,et al.  HPCC: high precision congestion control , 2019, SIGCOMM.

[4]  Zartash Afzal Uzmi,et al.  Workload adaptive flow scheduling , 2018, CoNEXT.

[5]  Amin Vahdat,et al.  Sincronia: near-optimal network design for coflows , 2018, SIGCOMM.

[6]  Yong Wang,et al.  Overload Control for Scaling WeChat Microservices , 2018, SoCC.

[7]  John K. Ousterhout,et al.  Homa: a receiver-driven low-latency transport protocol using network priorities , 2018, SIGCOMM.

[8]  Mor Harchol-Balter,et al.  WorkloadCompactor: reducing datacenter cost while providing tail latency SLO guarantees , 2017, SoCC.

[9]  Wei Bai,et al.  Information-Agnostic Flow Scheduling for Commodity Data Centers , 2015, NSDI.

[10]  Hari Balakrishnan,et al.  Flexplane: An Experimentation Platform for Resource Management in Datacenters , 2017, NSDI.

[11]  Mor Harchol-Balter,et al.  SNC-Meister: Admitting More Tenants with Tail Latency SLOs , 2016, SoCC.

[12]  Sandeep Chinchali,et al.  NUMFabric: Fast and Flexible Bandwidth Allocation in Datacenters , 2016, SIGCOMM.

[13]  Zhenhua Liu,et al.  HUG: Multi-Resource Fairness for Correlated and Elastic Demands , 2016, NSDI.

[14]  Gautam Kumar,et al.  pHost: distributed near-optimal datacenter transport over commodity network fabric , 2015, CoNEXT.

[15]  Randy H. Katz,et al.  FastLane: making short flows shorter with agile drop notification , 2015, SoCC.

[16]  Amin Vahdat,et al.  BwE: Flexible, Hierarchical Bandwidth Allocation for WAN Distributed Computing , 2015, Comput. Commun. Rev..

[17]  Amin Vahdat,et al.  TIMELY: RTT-based Congestion Control for the Datacenter , 2015, Comput. Commun. Rev..

[18]  Ming Zhang,et al.  Congestion Control for Large-Scale RDMA Deployments , 2015, Comput. Commun. Rev..

[19]  Justine Sherry,et al.  Silo: Predictable Message Latency in the Cloud , 2015, Comput. Commun. Rev..

[20]  Ion Stoica,et al.  Efficient Coflow Scheduling Without Prior Knowledge , 2015, SIGCOMM.

[21]  Robert N. M. Watson,et al.  Queues Don't Matter When You Can JUMP Them! , 2015, NSDI.

[22]  Devavrat Shah,et al.  Fastpass , 2014, SIGCOMM.

[23]  Mor Harchol-Balter,et al.  PriorityMeister: Tail Latency QoS for Shared Networked Storage , 2014, SoCC.

[24]  Hitesh Ballani,et al.  End-to-end Performance Isolation Through Virtual Datacenters , 2014, OSDI.

[25]  Antony I. T. Rowstron,et al.  Decentralized task-aware scheduling for data center networks , 2014, SIGCOMM.

[26]  Ion Stoica,et al.  Efficient coflow scheduling with Varys , 2014, SIGCOMM.

[27]  Antony I. T. Rowstron,et al.  IOFlow: a software-defined storage architecture , 2013, SOSP.

[28]  Nick McKeown,et al.  pFabric: minimal near-optimal datacenter transport , 2013, SIGCOMM.

[29]  Sujata Banerjee,et al.  ElasticSwitch: practical work-conserving bandwidth guarantees for cloud computing , 2013, SIGCOMM.

[30]  Srikanth Kandula,et al.  Achieving high utilization with software-driven WAN , 2013, SIGCOMM.

[31]  Dinan Gunawardena,et al.  Chatty Tenants and the Cloud Network Sharing Problem , 2013, NSDI.

[32]  Albert G. Greenberg,et al.  EyeQ: Practical Network Performance Isolation at the Edge , 2013, NSDI.

[33]  Randy H. Katz,et al.  Cake: enabling high-level SLOs on shared storage systems , 2012, SoCC '12.

[34]  Anees Shaikh,et al.  Performance Isolation and Fairness for Multi-Tenant Cloud Storage , 2012, OSDI.

[35]  Brighten Godfrey,et al.  Finishing flows quickly with preemptive scheduling , 2012, CCRV.

[36]  Zhou Yu,et al.  A Priority-Based Weighted Fair Queueing Algorithm in Wireless Sensor Network , 2012, 2012 8th International Conference on Wireless Communications, Networking and Mobile Computing.

[37]  A. Rowstron,et al.  Towards predictable datacenter networks , 2011, SIGCOMM.

[38]  Antony I. T. Rowstron,et al.  Better never than late: meeting deadlines in datacenter networks , 2011, SIGCOMM.

[39]  Albert G. Greenberg,et al.  Sharing the Data Center Network , 2011, NSDI.

[40]  Gautam Kumar,et al.  FairCloud: sharing the network in cloud computing , 2011, CCRV.

[41]  Helen J. Wang,et al.  SecondNet: a data center network virtualization architecture with bandwidth guarantees , 2010, CoNEXT.

[42]  Albert G. Greenberg,et al.  Data center TCP (DCTCP) , 2010, SIGCOMM '10.

[43]  Donald Beaver,et al.  Dapper, a Large-Scale Distributed Systems Tracing Infrastructure , 2010 .

[44]  Scott Shenker,et al.  Approximate fairness through differential dropping , 2003, CCRV.

[45]  David E. Culler,et al.  Overload management as a fundamental service design primitive , 2002, EW 10.

[46]  Mor Harchol-Balter,et al.  Analysis of SRPT scheduling: investigating unfairness , 2001, SIGMETRICS '01.

[47]  Konstantinos Psounis,et al.  CHOKe - a stateless active queue management scheme for approximating fair bandwidth allocation , 2000, Proceedings IEEE INFOCOM 2000. Conference on Computer Communications. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies (Cat. No.00CH37064).

[48]  George C. Polyzos,et al.  SCED: A Generalized Scheduling Policy for Guarantee* Quality-of-Service , 1999 .

[49]  Ion Stoica,et al.  A hierarchical fair service curve algorithm for link-sharing, real-time and priority services , 1997, SIGCOMM '97.

[50]  John Wroclawski,et al.  The Use of RSVP with IETF Integrated Services , 1997, RFC.

[51]  H. Vin,et al.  Start-time fair queueing: a scheduling algorithm for integrated services packet switching networks , 1996, SIGCOMM '96.

[52]  George Varghese,et al.  Efficient fair queueing using deficit round robin , 1995, SIGCOMM '95.

[53]  S. Jamaloddin Golestani,et al.  Network Delay Analysis of a Class of Fair Queueing Algorithms , 1995, IEEE J. Sel. Areas Commun..

[54]  QUTdN QeO,et al.  Random early detection gateways for congestion avoidance , 1993, TNET.

[55]  Abhay Parekh,et al.  A generalized processor sharing approach to flow control in integrated services networks-the single node case , 1992, [Proceedings] IEEE INFOCOM '92: The Conference on Computer Communications.

[56]  Rene L. Cruz,et al.  Service Burstiness and Dynamic Burstiness Measures: A Framework , 1992, J. High Speed Networks.

[57]  Rene L. Cruz,et al.  A calculus for network delay, Part I: Network elements in isolation , 1991, IEEE Trans. Inf. Theory.

[58]  Scott Shenker,et al.  Analysis and simulation of a fair queueing algorithm , 1989, SIGCOMM '89.

[59]  Raj Jain,et al.  Analysis of the Increase and Decrease Algorithms for Congestion Avoidance in Computer Networks , 1989, Comput. Networks.

[60]  V. Jacobson,et al.  Congestion avoidance and control , 1988, SIGCOMM '88.

[61]  J. Little A Proof for the Queuing Formula: L = λW , 1961 .