Titan: Fair Packet Scheduling for Commodity Multiqueue NICs

The performance of an OS’s networking stack can be measured by its achieved throughput, CPU utilization, latency, and per-flow fairness. To be able to drive increasing line-rates at 10Gbps and beyond, modern OS networking stacks rely on a number of important hardware and software optimizations, including but not limited to using multiple transmit and receive queues and segmentation offloading. Unfortunately, it not clear how best to leverage these optimizations to extract performance. The first contribution of this paper is a detailed empirical study of the impact of different OS and NIC configurations on this four-dimensional trade-off space. We find that enabling certain specific features is crucial for latency, CPU utilization, and throughput. However, substantial flow-level unfairness still remains. The second contribution of this paper is Titan, an extension to the Linux networking stack that systematically addresses unfairness arising in different operating conditions, while minimally impacting CPU utilization, latency, and throughput.

[1]  Albert G. Greenberg,et al.  Data center TCP (DCTCP) , 2010, SIGCOMM '10.

[2]  Robert Tappan Morris,et al.  Improving network connection locality on multicore systems , 2012, EuroSys '12.

[3]  Sylvia Ratnasamy,et al.  SoftNIC: A Software NIC to Augment Hardware , 2015 .

[4]  Rodrigo Fonseca,et al.  Planck , 2014, SIGCOMM.

[5]  Vimalkumar Jeyakumar,et al.  Juggler: a practical reordering resilient network stack for datacenters , 2016, EuroSys.

[6]  Thomas E. Anderson,et al.  Ingress Pipeline Queues Packet Buffer DMA PipelineDMA Egress Pipeline , 2015 .

[7]  Yu Chen,et al.  Scalable Kernel TCP Design and Implementation for Short-Lived Connections , 2016, ASPLOS.

[8]  Amin Vahdat,et al.  SENIC: Scalable NIC for End-Host Rate Limiting , 2014, NSDI.

[9]  Alan L. Cox,et al.  Hyper-Switch: A Scalable Software Virtual Switching Architecture , 2013, USENIX Annual Technical Conference.

[10]  Scott Shenker,et al.  Universal Packet Scheduling , 2015, NSDI.

[11]  Justine Sherry,et al.  Silo: Predictable Message Latency in the Cloud , 2015, Comput. Commun. Rev..

[12]  J Gettys,et al.  Bufferbloat: Dark Buffers in the Internet , 2011, IEEE Internet Computing.

[13]  Ankit Singla,et al.  Practical DCB for improved data center networks , 2014, IEEE INFOCOM 2014 - IEEE Conference on Computer Communications.

[14]  Hua Chen,et al.  Pingmesh: A Large-Scale System for Data Center Network Latency Measurement and Analysis , 2015, SIGCOMM.

[15]  Nick McKeown,et al.  pFabric: minimal near-optimal datacenter transport , 2013, SIGCOMM.

[16]  Nick McKeown,et al.  Programmable Packet Scheduling at Line Rate , 2016, SIGCOMM.

[17]  George Varghese,et al.  CONGA: distributed congestion-aware load balancing for datacenters , 2015, SIGCOMM.

[18]  Keqiang He,et al.  Presto: Edge-based Load Balancing for Fast Datacenter Networks , 2015, SIGCOMM.

[19]  Amin Vahdat,et al.  BwE: Flexible, Hierarchical Bandwidth Allocation for WAN Distributed Computing , 2015, Comput. Commun. Rev..

[20]  Alex C. Snoeren,et al.  Inside the Social Network's (Datacenter) Network , 2015, Comput. Commun. Rev..

[21]  Ming Zhang,et al.  MicroTE: fine grained traffic engineering for data centers , 2011, CoNEXT '11.

[22]  George Varghese,et al.  Efficient fair queueing using deficit round robin , 1995, SIGCOMM '95.

[23]  K. K. Ramakrishnan,et al.  Eliminating receive livelock in an interrupt-driven kernel , 1996, TOCS.

[24]  D. Zats,et al.  DeTail: reducing the flow completion time tail in datacenter networks , 2012, CCRV.

[25]  J. Crowcroft Hey ! Presto : Edge-based Load Balancing for Fast Datacenter Networks , 2015 .

[26]  Hong Liu,et al.  Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google's Datacenter Network , 2015, Comput. Commun. Rev..

[27]  Paolo Valente,et al.  PSPAT: Software packet scheduling at hardware speed , 2018, Comput. Commun..

[28]  Willy Zwaenepoel,et al.  Optimizing TCP Receive Performance , 2008, USENIX ATC.

[29]  T. N. Vijaykumar,et al.  Deadline-aware datacenter tcp (D2TCP) , 2012, SIGCOMM '12.

[30]  Geoffrey M. Voelker,et al.  Bullet trains: a study of NIC burst behavior at microsecond timescales , 2013, CoNEXT.