FUSO: Fast Multi-Path Loss Recovery for Data Center Networks

To achieve low TCP flow completion time (FCT) in data center networks (DCNs), it is critical and challenging to rapidly recover loss without adding extra congestion. Therefore, in this paper, we propose a novel loss recovery approach fast multi-path loss recovery (FUSO) that exploits multi-path diversity in DCN for transport loss recovery. In FUSO, when a multi-path transport sender suspects loss on one sub-flow, recovery packets are immediately sent over another sub-flow that is not or less lossy <italic>and</italic> has spare congestion window slots. FUSO is <italic>fast</italic> in that it does not need to wait for timeout on the lossy sub-flow, and it is <italic>cautious</italic> in that it does not violate the congestion control algorithm. Testbed experiments and simulations show that FUSO decreases the latency-sensitive flows’ <inline-formula> <tex-math notation="LaTeX">$99^{th}$ </tex-math></inline-formula> percentile FCT by up to ~82.3% in a 1-Gb/s testbed, and up to ~87.9% in a 10 Gb/s large-scale simulated network.

[1]  Amin Vahdat,et al.  Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google's Datacenter Network , 2015, Comput. Commun. Rev..

[2]  Hari Balakrishnan,et al.  Network Working Group , 1991 .

[3]  Yuval Bachar Disaggregation – the new way to build mega (and micro) data centers , 2015, 2015 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS).

[4]  Randy H. Katz,et al.  DeTail: reducing the flow completion time tail in datacenter networks , 2012, SIGCOMM '12.

[5]  Mark Handley,et al.  How Hard Can It Be? Designing and Implementing a Deployable Multipath TCP , 2012, NSDI.

[6]  Sally Floyd,et al.  TCP Selective Acknowledgment Options , 1996, RFC.

[7]  Haitao Wu,et al.  Tuning ECN for data center networks , 2012, CoNEXT '12.

[8]  Ming Zhang,et al.  Proceedings of the General Track: 2004 USENIX Annual Technical Conference , 2022 .

[9]  Haitao Wu,et al.  BCube: a high performance, server-centric network architecture for modular data centers , 2009, SIGCOMM '09.

[10]  George Varghese,et al.  CONGA: distributed congestion-aware load balancing for datacenters , 2015, SIGCOMM.

[11]  Ali Munir,et al.  Minimizing flow completion times in data centers , 2013, 2013 Proceedings IEEE INFOCOM.

[12]  Amin Vahdat,et al.  A scalable, commodity data center network architecture , 2008, SIGCOMM '08.

[13]  Baochun Li,et al.  RepFlow: Minimizing flow completion times with replicated flows in data centers , 2013, IEEE INFOCOM 2014 - IEEE Conference on Computer Communications.

[14]  Amin Vahdat,et al.  Hedera: Dynamic Flow Scheduling for Data Center Networks , 2010, NSDI.

[15]  David A. Maltz,et al.  Data center TCP (DCTCP) , 2010, SIGCOMM 2010.

[16]  Amar Phanishayee,et al.  Safe and effective fine-grained TCP retransmissions for datacenter communication , 2009, SIGCOMM '09.

[17]  Matthew Mathis,et al.  Forward acknowledgement: refining TCP congestion control , 1996, SIGCOMM '96.

[18]  Nick McKeown,et al.  Why flow-completion time is the right metric for congestion control , 2006, CCRV.

[19]  Steve Uhlig,et al.  Demystifying and mitigating TCP stalls at the server side , 2015, CoNEXT.

[20]  Mo Dong,et al.  Halfback: running short flows quickly and safely , 2015, CoNEXT.

[21]  Minlan Yu,et al.  Don't drop, detour! , 2013, SIGCOMM.

[22]  Ramesh Govindan,et al.  Reducing web latency: the virtue of gentle aggression , 2013, SIGCOMM.

[23]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[24]  Christo Wilson,et al.  Better never than late , 2011, SIGCOMM 2011.

[25]  Mark Handley,et al.  TCP Extensions for Multipath Operation with Multiple Addresses , 2020, RFC.

[26]  Glenn Judd,et al.  Attaining the Promise and Avoiding the Pitfalls of TCP in the Datacenter , 2015, NSDI.

[27]  Hua Chen,et al.  Pingmesh: A Large-Scale System for Data Center Network Latency Measurement and Analysis , 2015, SIGCOMM.

[28]  Haitao Wu,et al.  Enabling ECN in Multi-Service Multi-Queue Data Centers , 2016, NSDI.

[29]  Lili Wang,et al.  A Conservative Loss Recovery Algorithm Based on Selective Acknowledgment (SACK) for TCP , 2012, RFC.

[30]  Mark Handley,et al.  Improving datacenter performance and robustness with multipath TCP , 2011, SIGCOMM 2011.

[31]  Matthew Mathis,et al.  Tail Loss Probe (TLP): An Algorithm for Fast Recovery of Tail Losses , 2013 .

[32]  Nandita Dukkipati,et al.  TCP Instant Recovery: Incorporating Forward Error Correction in TCP , 2013 .

[33]  E. Gilbert Capacity of a burst-noise channel , 1960 .

[34]  Ming Zhang,et al.  Congestion Control for Large-Scale RDMA Deployments , 2015, Comput. Commun. Rev..

[35]  Christian E. Hopps,et al.  Analysis of an Equal-Cost Multi-Path Algorithm , 2000, RFC.

[36]  Donald F. Towsley,et al.  Path Selection and Multipath Congestion Control , 2007, INFOCOM.

[37]  Haitao Wu,et al.  Explicit Path Control in Commodity Data Centers: Design and Applications , 2016, IEEE/ACM Transactions on Networking.

[38]  S. Hemminger Network Emulation with NetEm , 2022 .

[39]  Robert N. M. Watson,et al.  Queues Don't Matter When You Can JUMP Them! , 2015, NSDI.

[40]  Amin Vahdat,et al.  TIMELY: RTT-based Congestion Control for the Datacenter , 2015, Comput. Commun. Rev..

[41]  Chuang Lin,et al.  Catch the Whole Lot in an Action: Rapid Precise Packet Loss Notification in Data Center , 2014, NSDI.

[42]  Ian Wakeman,et al.  MMPTCP: A multipath transport protocol for data centers , 2016, IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications.

[43]  Konstantin Avrachenkov,et al.  Early Retransmit for TCP and Stream Control Transmission Protocol (SCTP) , 2010, RFC.

[44]  Junda Liu,et al.  Multi-enterprise networking , 2000 .

[45]  Vern Paxson,et al.  TCP Congestion Control , 1999, RFC.

[46]  T. N. Vijaykumar,et al.  Deadline-aware datacenter tcp (D2TCP) , 2012, CCRV.