Fast, scalable, and programmable packet scheduler in hardware

With increasing link speeds and slowdown in the scaling of CPU speeds, packet scheduling in software is resulting in lower precision and higher CPU utilization. By offloading packet scheduling to the hardware such as a NIC, one can potentially overcome these drawbacks. However, to retain the flexibility of software packet schedulers, packet scheduler in hardware must be programmable, while also being fast and scalable. State-of-the-art packet schedulers in hardware either compromise on scalability (Push-In-First-Out (PIFO)) or the ability to express a wide range of packet scheduling algorithms (First-In-First-Out (FIFO)). Further, even a general scheduling primitive like PIFO is not expressive enough to express certain key classes of packet scheduling algorithms. Hence in this paper, we propose a generalization of the PIFO primitive, called Push-In-Extract-Out (PIEO), which like PIFO, maintains an ordered list of elements, but unlike PIFO which only allows dequeue from the head of the list, PIEO allows dequeue from arbitrary positions in the list by supporting a programmable predicate-based filtering at dequeue. Next, we present a fast and scalable hardware design of PIEO scheduler and prototype it on a FPGA. Overall, PIEO scheduler is both more expressive and over 30× more scalable than PIFO.

[1]  George Varghese,et al.  Forwarding metamorphosis: fast programmable match-action processing in hardware for SDN , 2013, SIGCOMM.

[2]  Christo Wilson,et al.  Better never than late , 2011, SIGCOMM 2011.

[3]  Antony I. T. Rowstron,et al.  Better never than late: meeting deadlines in datacenter networks , 2011, SIGCOMM.

[4]  Amin Vahdat,et al.  SENIC: Scalable NIC for End-Host Rate Limiting , 2014, NSDI.

[5]  Monia Ghobadi,et al.  HotCocoa: Hardware Congestion Control Abstractions , 2017, HotNets.

[6]  Randy Brown,et al.  Calendar queues: a fast 0(1) priority queue implementation for the simulation event set problem , 1988, CACM.

[7]  Domenico Ferrari,et al.  Rate-Controlled Service Disciplines , 1994, J. High Speed Networks.

[8]  Hui Zhang,et al.  WF/sup 2/Q: worst-case fair weighted fair queueing , 1996, Proceedings of IEEE INFOCOM '96. Conference on Computer Communications.

[9]  George Varghese,et al.  Efficient fair queueing using deficit round-robin , 1996, TNET.

[10]  Devavrat Shah,et al.  Fastpass , 2014, SIGCOMM.

[11]  Anthony Lauck,et al.  Hashed and hierarchical timing wheels: data structures for the efficient implementation of a timer facility , 1987, SOSP '87.

[12]  Hui Zhang,et al.  Hierarchical packet fair queueing algorithms , 1996, SIGCOMM '96.

[13]  Jun Xu,et al.  On fundamental tradeoffs between delay bounds and computational complexity in packet scheduling algorithms , 2005, TNET.

[14]  Amin Vahdat,et al.  Carousel: Scalable Traffic Shaping at End Hosts , 2017, SIGCOMM.

[15]  Nick McKeown,et al.  Programmable Packet Scheduling at Line Rate , 2016, SIGCOMM.

[16]  Alex C. Snoeren,et al.  RotorNet: A Scalable, Low-complexity, Optical Datacenter Network , 2017, SIGCOMM.

[17]  Michael M. Swift,et al.  Titan: Fair Packet Scheduling for Commodity Multiqueue NICs , 2017, USENIX ATC.

[18]  Alvin Cheung,et al.  Packet Transactions: High-Level Programming for Line-Rate Switches , 2015, SIGCOMM.

[19]  Kushagra Vaid,et al.  Azure Accelerated Networking: SmartNICs in the Public Cloud , 2018, NSDI.

[20]  Karthikeyan Sankaralingam,et al.  Dark Silicon and the End of Multicore Scaling , 2012, IEEE Micro.

[21]  Kang G. Shin,et al.  Scalable hardware priority queue architectures for high-speed packet switches , 1997, Proceedings Third IEEE Real-Time Technology and Applications Symposium.

[22]  Scott Shenker,et al.  Analysis and simulation of a fair queueing algorithm , 1989, SIGCOMM '89.

[23]  Bill Lin,et al.  Fast and scalable priority queue architecture for high-speed network switches , 2000, Proceedings IEEE INFOCOM 2000. Conference on Computer Communications. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies (Cat. No.00CH37064).

[24]  Robert N. M. Watson,et al.  Queues Don't Matter When You Can JUMP Them! , 2015, NSDI.

[25]  Hakim Weatherspoon,et al.  Shoal: A Network Architecture for Disaggregated Racks , 2019, NSDI.

[26]  Amin Vahdat,et al.  TIMELY: RTT-based Congestion Control for the Datacenter , 2015, Comput. Commun. Rev..

[27]  Michael M. Swift,et al.  Loom: Flexible and Efficient NIC Packet Scheduling , 2019, NSDI.

[28]  小林 克志 廃棄つきEarliest Deadline Firstパケットスケジューラの設計および評価 : 遅延要求をサポートするインターネットにむけて (インターネットアーキテクチャ) , 2014 .

[29]  Amin Vahdat,et al.  Practical TDMA for datacenter ethernet , 2012, EuroSys '12.

[30]  P. McKenney Stochastic Fairness Queuing , 1991 .

[31]  Robert P. Colwell,et al.  The chip design game at the end of Moore's law , 2013, 2013 IEEE Hot Chips 25 Symposium (HCS).

[32]  Wei Bai,et al.  Information-Agnostic Flow Scheduling for Commodity Data Centers , 2015, NSDI.

[33]  Ming Liu,et al.  Approximating Fair Queueing on Reconfigurable Switches , 2018, NSDI.

[34]  Jonathan Rose,et al.  Measuring the Gap Between FPGAs and ASICs , 2007, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[35]  George Porter,et al.  Evaluating the performance of software NICs for 100-gb/s datacenter traffic control , 2018, ANCS.

[36]  Scott Shenker,et al.  Universal Packet Scheduling , 2015, NSDI.

[37]  Srinivasan Seshan,et al.  Scheduling techniques for hybrid circuit/packet networks , 2015, CoNEXT.

[38]  Amin Vahdat,et al.  Less Is More: Trading a Little Bandwidth for Ultra-Low Latency in the Data Center , 2012, NSDI.

[39]  George Varghese,et al.  Efficient fair queueing using deficit round robin , 1995, SIGCOMM '95.

[40]  Pramod Viswanath,et al.  Costly circuits, submodular schedules and approximate Carathéodory Theorems , 2016, Queueing Syst. Theory Appl..

[41]  Anthony Lauck,et al.  Hashed and hierarchical timing wheels: efficient data structures for implementing a timer facility , 1997, TNET.

[42]  Justine Sherry,et al.  Silo: Predictable Message Latency in the Cloud , 2015, Comput. Commun. Rev..