zD: a scalable zero-drop network stack at end hosts

Modern end-host network stacks have to handle traffic from tens of thousands of flows and hundreds of virtual machines per single host, to keep up with the scale of modern clouds. This can cause congestion for traffic egressing from the end host. The effects of this congestion have received little attention. Currently, an overflowing queue, like a kernel queuing discipline, will drop incoming packets. Packet drops lead to worse network and CPU performance by inflating the time to transmit the packet as well as spending extra effort on retansmissions. In this paper, we show that current end-host mechanisms can lead to high CPU utilization, high tail latency, and low throughput in cases of congestion of egress traffic within the end host. We present zD, a framework for applying backpressure from a congested queue to traffic sources at end hosts that can scale to thousands of flows. We implement zD to apply backpressure in two settings: i) between TCP sources and kernel queuing discipline, and ii) between VMs as traffic sources and kernel queuing discipline in the hypervisor. zD improves throughput by up to 60%, and improves tail RTT by at least 10x at high loads, compared to standard kernel implementation.

[1]  M. Mitchell Waldrop,et al.  The chips are down for Moore’s law , 2016, Nature.

[2]  Martín Casado,et al.  Extending Networking into the Virtualization Layer , 2009, HotNets.

[3]  Vishal Shrivastav,et al.  Fast, scalable, and programmable packet scheduler in hardware , 2019, SIGCOMM.

[4]  Hari Balakrishnan,et al.  Shenango: Achieving High CPU Efficiency for Latency-sensitive Datacenter Workloads , 2019, NSDI.

[5]  Ying Zhang,et al.  Low Latency Software Rate Limiters for Cloud Networks , 2017, APNet.

[6]  Kashif Habib,et al.  Differentiated Services on Linux , 2002 .

[7]  Ming Zhang,et al.  Congestion Control for Large-Scale RDMA Deployments , 2015, Comput. Commun. Rev..

[8]  Minlan Yu,et al.  HPCC: high precision congestion control , 2019, SIGCOMM.

[9]  Vikas Agarwal,et al.  Clock rate versus IPC: the end of the road for conventional microarchitectures , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[10]  Rusty Russell,et al.  virtio: towards a de-facto standard for virtual I/O devices , 2008, OPSR.

[11]  D. Geer,et al.  Chip makers turn to multicore processors , 2005, Computer.

[12]  Albert G. Greenberg,et al.  Data center TCP (DCTCP) , 2010, SIGCOMM '10.

[13]  Michael M. Swift,et al.  Loom: Flexible and Efficient NIC Packet Scheduling , 2019, NSDI.

[14]  Edouard Bugnion,et al.  ZygOS: Achieving Low Tail Latency for Microsecond-scale Networked Tasks , 2017, SOSP.

[15]  Van Jacobson,et al.  BufferBloat: What’s Wrong with the Internet? , 2011, ACM Queue.

[16]  Amin Vahdat,et al.  Less Is More: Trading a Little Bandwidth for Ultra-Low Latency in the Data Center , 2012, NSDI.

[17]  Martín Casado,et al.  The Design and Implementation of Open vSwitch , 2015, NSDI.

[18]  Qian Xu,et al.  VIRTIO-USER: A New Versatile Channel for Kernel-Bypass Networks , 2017, KBNets@SIGCOMM.

[19]  Junda Liu,et al.  Multi-enterprise networking , 2000 .

[20]  Amin Vahdat,et al.  Carousel: Scalable Traffic Shaping at End Hosts , 2017, SIGCOMM.

[21]  Amin Vahdat,et al.  SENIC: Scalable NIC for End-Host Rate Limiting , 2014, NSDI.

[22]  Albert G. Greenberg,et al.  EyeQ: Practical Network Performance Isolation at the Edge , 2013, NSDI.

[23]  Nate Foster,et al.  PicNIC: predictable virtualized NIC , 2019, SIGCOMM.

[24]  Hsiao-Keng Jerry Chu,et al.  Zero-Copy TCP in Solaris , 1996, USENIX Annual Technical Conference.

[25]  Amin Vahdat,et al.  TIMELY: RTT-based Congestion Control for the Datacenter , 2015, Comput. Commun. Rev..

[26]  N. Cardwell,et al.  Making Linux TCP Fast , 2016 .

[27]  Nan Hua,et al.  Andromeda: Performance, Isolation, and Velocity at Scale in Cloud Network Virtualization , 2018, NSDI.

[28]  Khaled A. Harras,et al.  Eiffel: Efficient and Flexible Software Packet Scheduling , 2018, NSDI.