Taming Latency in Data Centers Via Active Congestion-Probing

In cloud environments, interactive applications deployed in data centers often generate swarms of short-lived data transfers (or flows) that face dramatic competition for the scarce switch buffer space from other short-lived as well as the long-lived flows. In the presence of bloated queues, such short-lived flows often experience multiple packet losses per round-trip time which often triggers the timeout-based loss recovery mechanism. A direct consequence of this is an inflated application response time that turns out to be orders of magnitude larger than what it should be. A data center aware TCP protocol (DCTCP) was designed as a new TCP specifically to address this issue, however, it does not consider its co-existence with other transport protocol (e.g., CuBIC and NewReno of Linux). In such situations, which are abundant in multi-tenant data centers, the legacy large initial congestion window sizes (e.g., 10 segments), induce multiple packet losses at the onset of a TCP flow, which forces timeout and even binary exponential backoff. In this paper, we propose a novel Hypervisor-based, application-transparent approach for active congestion probing to enable the hypervisor to infer on-path congestion before the TCP connection is fully established for new traffic to avoid such massive packet losses and timeout. The so-called ProBoSCIS mechanism does not require any changes to TCP, works with all versions of TCP and does not need any special network hardware features other than those that exist in today's data center commodity switches. We show its effectiveness via ns2 simulation and demonstrate its practical feasibility by implementing and deploying it in a small-scale data center test-bed. We show the significant reduction in application latency by adopting ProBoSCIS in a series of real experiments.

[1]  Guillaume Urvoy-Keller,et al.  Performance analysis of LAS-based scheduling disciplines in a packet switched network , 2004, SIGMETRICS '04/Performance '04.

[2]  Chuang Lin,et al.  Comprehensive understanding of TCP Incast problem , 2015, 2015 IEEE Conference on Computer Communications (INFOCOM).

[3]  Donald F. Towsley,et al.  A control theoretic analysis of RED , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).

[4]  Nick McKeown,et al.  Virtualized Congestion Control , 2016, SIGCOMM.

[5]  Li Tang,et al.  Modeling and Solving TCP Incast Problem in Data Center Networks , 2015, IEEE Transactions on Parallel and Distributed Systems.

[6]  Brahim Bensaou,et al.  HyGenICC: Hypervisor-based generic IP congestion control for virtualized data centers , 2016, 2016 IEEE International Conference on Communications (ICC).

[7]  Rajkumar Buyya,et al.  Scaling MapReduce Applications Across Hybrid Clouds to Meet Soft Deadlines , 2013, 2013 IEEE 27th International Conference on Advanced Information Networking and Applications (AINA).

[8]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[9]  Balaji Prabhakar,et al.  Stability analysis of QCN: the averaging principle , 2011, SIGMETRICS.

[10]  Jianxin Wang,et al.  ARS: Cross-layer adaptive request scheduling to mitigate TCP incast in data center networks , 2016, IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications.

[11]  Nick McKeown,et al.  Deconstructing datacenter packet transport , 2012, HotNets-XI.

[12]  Wei Bai,et al.  Information-Agnostic Flow Scheduling for Commodity Data Centers , 2015, NSDI.

[13]  Ali Munir,et al.  Minimizing flow completion times in data centers , 2013, 2013 Proceedings IEEE INFOCOM.

[14]  Brahim Bensaou,et al.  SICC: SDN-based incast congestion control for data centers , 2017, 2017 IEEE International Conference on Communications (ICC).

[15]  Muthu Dayalan,et al.  MapReduce : Simplified Data Processing on Large Cluster , 2018 .

[16]  Amin Vahdat,et al.  TIMELY: RTT-based Congestion Control for the Datacenter , 2015, Comput. Commun. Rev..

[17]  David A. Maltz,et al.  Data center TCP (DCTCP) , 2010, SIGCOMM 2010.

[18]  Keqiang He,et al.  AC/DC TCP: Virtual Congestion Control Enforcement for Datacenter Networks , 2016, SIGCOMM.

[19]  H. Jonathan Chao,et al.  TCP PLATO: Packet Labelling to Alleviate Time-Out , 2014, IEEE Journal on Selected Areas in Communications.

[20]  Amar Phanishayee,et al.  Safe and effective fine-grained TCP retransmissions for datacenter communication , 2009, SIGCOMM '09.

[21]  Haitao Wu,et al.  ICTCP: Incast Congestion Control for TCP in Data-Center Networks , 2013, IEEE/ACM Transactions on Networking.

[22]  Brahim Bensaou,et al.  Curbing Timeouts for TCP-Incast in Data Centers via A Cross-Layer Faster Recovery Mechanism , 2018, IEEE INFOCOM 2018 - IEEE Conference on Computer Communications.

[23]  Brahim Bensaou,et al.  Hysteresis-based Active Queue Management for TCP Traffic in Data Centers , 2019, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.

[24]  Ihsan Ayyub Qazi,et al.  On the coexistence of transport protocols in data centers , 2014, 2014 IEEE International Conference on Communications (ICC).