RoCC: robust congestion control for RDMA

In this paper, we present RoCC, a robust congestion control approach for datacenter networks based on RDMA. RoCC leverages switch queue size as an input to a PI controller, which computes the fair data rate of flows in the queue, signaling it to the flow sources. The PI parameters are self-tuning to guarantee stability, rapid convergence, and fair and near-optimal throughput in a wide range of congestion scenarios. Our simulation and DPDK implementation results show that RoCC can achieve up to 7× reduction in PFC frames generated under high average load levels, compared to DCQCN. At the same time, RoCC can achieve up to 8× lower tail latency, compared to DCQCN and HPCC. We also find that RoCC does not require PFC. The functional components of RoCC are implementable in P4-based and fixed-function switch ASICs.

[1]  Jiang Zhu,et al.  Making Large Scale Deployment of RCP Practical for Real Networks , 2008, IEEE INFOCOM 2008 - The 27th Conference on Computer Communications.

[2]  ZhuYibo,et al.  Congestion Control for Large-Scale RDMA Deployments , 2015 .

[3]  Jordi Ros-Giralt,et al.  High Speed Elephant Flow Detection Under Partial Information , 2018, 2018 International Symposium on Networks, Computers and Communications (ISNCC).

[4]  Nick McKeown,et al.  Event-Driven Packet Processing , 2019, HotNets.

[5]  Fengyuan Ren,et al.  TFC: token flow control in data center networks , 2016, EuroSys.

[6]  Amin Vahdat,et al.  Less Is More: Trading a Little Bandwidth for Ultra-Low Latency in the Data Center , 2012, NSDI.

[7]  Yi Lu,et al.  ElephantTrap: A low cost device for identifying large flows , 2007, 15th Annual IEEE Symposium on High-Performance Interconnects (HOTI 2007).

[8]  Guihai Chen,et al.  DCQCN+: Taming Large-Scale Incast Congestion in RDMA over Ethernet Networks , 2018, 2018 IEEE 26th International Conference on Network Protocols (ICNP).

[9]  Ankit Singla,et al.  Practical DCB for improved data center networks , 2014, IEEE INFOCOM 2014 - IEEE Conference on Computer Communications.

[10]  Vishal Misra,et al.  ECN or Delay: Lessons Learnt from Analysis of DCQCN and TIMELY , 2016, CoNEXT.

[11]  Fengyuan Ren,et al.  Gentle flow control: avoiding deadlock in lossless networks , 2019, SIGCOMM.

[12]  John K. Ousterhout,et al.  Homa: a receiver-driven low-latency transport protocol using network priorities , 2018, SIGCOMM.

[13]  Adam J. Aviv,et al.  Turboflow: information rich flow record generation on commodity switches , 2018, EuroSys.

[14]  Mark Handley,et al.  Congestion control for high bandwidth-delay product networks , 2002, SIGCOMM '02.

[15]  Albert G. Greenberg,et al.  Data center TCP (DCTCP) , 2010, SIGCOMM '10.

[16]  Hari Balakrishnan,et al.  TCP ex machina: computer-generated congestion control , 2013, SIGCOMM.

[17]  Mo Dong,et al.  PCC: Re-architecting Congestion Control for Consistent High Performance , 2014, NSDI.

[18]  Hari Balakrishnan,et al.  Copa: Practical Delay-Based Congestion Control for the Internet , 2018, ANRW.

[19]  David A. Maltz,et al.  Network traffic characteristics of data centers in the wild , 2010, IMC '10.

[20]  Dezun Dong,et al.  Congestion control in high-speed lossless data center networks: A survey , 2018, Future Gener. Comput. Syst..

[21]  Hari Balakrishnan,et al.  Rethinking Congestion Control for Cellular Networks , 2017, HotNets.

[22]  Mun Choon Chan,et al.  TimerTasks: Towards Time-driven Execution in Programmable Dataplanes , 2019, SIGCOMM Posters and Demos.

[23]  Jitendra Padhye,et al.  Deadlocks in Datacenter Networks: Why Do They Form, and How to Avoid Them , 2016, HotNets.

[24]  Mihai Budiu,et al.  The P416 Programming Language , 2017, OPSR.

[25]  Scott Shenker,et al.  Approximate fairness through differential dropping , 2003, CCRV.

[26]  Alex C. Snoeren,et al.  Inside the Social Network's (Datacenter) Network , 2015, Comput. Commun. Rev..

[27]  Chita R. Das,et al.  Design of a Dynamic Priority-Based Fast Path Architecture for On-Chip Interconnects , 2007 .

[28]  Gene F. Franklin,et al.  Feedback Control of Dynamic Systems , 1986 .

[29]  Vijay Subramanian,et al.  PIE: A lightweight control scheme to address the bufferbloat problem , 2013, 2013 IEEE 14th International Conference on High Performance Switching and Routing (HPSR).

[30]  Albert G. Greenberg,et al.  The nature of data center traffic: measurements & analysis , 2009, IMC '09.

[31]  Scott Shenker,et al.  Revisiting network support for RDMA , 2018, SIGCOMM.

[32]  D. Zats,et al.  DeTail: reducing the flow completion time tail in datacenter networks , 2012, CCRV.

[33]  Kuang-Ching Wang,et al.  The Design and Operation of CloudLab , 2019, USENIX ATC.

[34]  Amin Vahdat,et al.  TIMELY: RTT-based Congestion Control for the Datacenter , 2015, Comput. Commun. Rev..

[35]  George Varghese,et al.  P4: programming protocol-independent packet processors , 2013, CCRV.

[36]  Ming Zhang,et al.  Congestion Control for Large-Scale RDMA Deployments , 2015, Comput. Commun. Rev..

[37]  Rong Pan,et al.  Data center transport mechanisms: Congestion control theory and IEEE standardization , 2008, 2008 46th Annual Allerton Conference on Communication, Control, and Computing.

[38]  Charles E. Leiserson,et al.  Fat-trees: Universal networks for hardware-efficient supercomputing , 1985, IEEE Transactions on Computers.

[39]  Rong Pan,et al.  Let It Flow: Resilient Asymmetric Load Balancing with Flowlet Switching , 2017, NSDI.

[40]  Van Jacobson,et al.  BBR: Congestion-Based Congestion Control , 2016, ACM Queue.

[41]  Minlan Yu,et al.  HPCC: high precision congestion control , 2019, SIGCOMM.