Queue length feedback-based solution of TCP Incast in data center networks

The Internet offers a large number of applications and services that we use on a daily basis. These widely used applications are hosted on large-scale, high-performance computing systems called data centers. The performance of TCP is inefficient in many-to-one communication, which is a common traffic pattern in data center networks. This many-to-one communication causes significant packet losses followed by timeouts, which consequently results in throughput collapse in data center networks; this problem is known as TCP Incast. In this paper, we present a queue length feedback-based solution to mitigate TCP Incast. The scheme has two parts: i) a novel queue length-based congestion parameter, which accurately measures congestion along the path from source to destination, and ii) a congestion control scheme that effectively uses the new congestion parameter to prevent throughput collapse due to Incast traffic patterns. Results are compared with TCP and DCTCP, the two most common transport protocols deployed in data center networks. The results show that the proposed scheme minimizes packet drops and achieves high utilization and burst tolerance.

[1]  Brahim Bensaou,et al.  Mitigating incast-TCP congestion in data centers with SDN , 2018, Ann. des Télécommunications.

[2]  Jianxin Wang,et al.  Flow-Aware Adaptive Pacing to Mitigate TCP Incast in Data Center Networks , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS).

[3]  H. Jonathan Chao,et al.  TCP PLATO: Packet Labelling to Alleviate Time-Out , 2014, IEEE Journal on Selected Areas in Communications.

[4]  Jianxin Wang,et al.  Adjusting Packet Size to Mitigate TCP Incast in Data Center Networks with COTS Switches , 2020, IEEE Transactions on Cloud Computing.

[5]  Yan Zhang,et al.  On Architecture Design, Congestion Notification, TCP Incast and Power Consumption in Data Centers , 2013, IEEE Communications Surveys & Tutorials.

[6]  Asad Waqar Malik,et al.  Detection and Mitigation of Congestion in SDN Enabled Data Center Networks: A Survey , 2018, IEEE Access.

[7]  Sally Floyd,et al.  TCP and explicit congestion notification , 1994, CCRV.

[8]  Junda Liu,et al.  Multi-enterprise networking , 2000 .

[9]  Chuang Lin,et al.  Modeling and understanding TCP incast in data center networks , 2011, 2011 Proceedings IEEE INFOCOM.

[10]  Jun Li,et al.  A survey on TCP Incast in data center networks , 2014, Int. J. Commun. Syst..

[11]  David A. Maltz,et al.  Data center TCP (DCTCP) , 2010, SIGCOMM 2010.

[12]  I-Hsuan Peng,et al.  A Cross-Layer Flow Schedule with Dynamical Grouping for Avoiding TCP Incast Problem in Data Center Networks , 2016, RACS.

[13]  Dennis Abts,et al.  A guided tour of data-center networking , 2012, Commun. ACM.

[14]  Parag Agrawal,et al.  The case for RAMCloud , 2011, Commun. ACM.

[15]  Chuang Lin,et al.  Sharing Bandwidth by Allocating Switch Buffer in Data Center Networks , 2014, IEEE Journal on Selected Areas in Communications.

[16]  Albert G. Greenberg,et al.  The cost of a cloud: research problems in data center networks , 2008, CCRV.

[17]  Amin Vahdat,et al.  A scalable, commodity data center network architecture , 2008, SIGCOMM '08.

[18]  Ke Xu,et al.  Throughput optimization of TCP incast congestion control in large-scale datacenter networks , 2017, Comput. Networks.

[19]  Albert Y. Zomaya,et al.  Quantitative comparisons of the state‐of‐the‐art data center architectures , 2013, Concurr. Comput. Pract. Exp..

[20]  Jianxin Wang,et al.  ARS: Cross-layer adaptive request scheduling to mitigate TCP incast in data center networks , 2016, IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications.

[21]  Dagang Li,et al.  Adaptive rate control for TCP Incast based on selective ECN-marking , 2016, 2016 7th IEEE International Conference on Software Engineering and Service Science (ICSESS).

[22]  Lotfi Mhamdi,et al.  M21TCP: Overcoming TCP incast congestion in data centres , 2015, 2015 IEEE 4th International Conference on Cloud Networking (CloudNet).

[23]  Chuang Lin,et al.  Survey on transport control in data center networks , 2013, IEEE Network.

[24]  Meejeong Lee,et al.  A simple and efficient approach for reducing TCP timeouts due to lack of duplicate acknowledgments in data center networks , 2016, Cluster Computing.

[25]  Lei Shi,et al.  Dcell: a scalable and fault-tolerant network structure for data centers , 2008, SIGCOMM '08.