Preventing TCP incast throughput collapse at the initiation, continuation, and termination

Incast applications have grown in popularity with the advancement of data center technology. It is found that the TCP incast may suffer from the throughput collapse problem, as a consequence of TCP retransmission timeouts when the bottleneck buffer is overwhelmed and causes the packet losses. This is critical to the Quality of Service of cloud computing applications. While some previous literature has proposed solutions, we still see the problem not completely solved. In this paper, we investigate the three root causes for the poor performance of TCP incast flows and propose three solutions, one for each at the beginning, the middle and the end of a TCP connection. The three solutions are: admission control to TCP flows so that the flow population would not exceed the network's capacity; retransmission based on timestamp to detect loss of retransmitted packets; and reiterated FIN packets to keep the TCP connection active until the the termination of a session is acknowledged. The orchestration of these solutions prevents the throughput collapse. The main idea of these solutions is to ensure all the on-going TCP incast flows can maintain the self-clocking, thus eliminates the need to resort to retransmission timeout for recovery. We evaluate these solutions and find them work well in preventing the retransmission timeout of TCP incast flows, hence also preventing the throughput collapse.

[1]  Chunming Qiao,et al.  An Effective Approach to Preventing TCP Incast Throughput Collapse for Data Center Networks , 2011, 2011 IEEE Global Telecommunications Conference - GLOBECOM 2011.

[2]  Srinivasan Seshan,et al.  Measurement and Analysis of TCP Throughput Collapse in Cluster-based Storage Systems , 2008, FAST.

[3]  Amar Phanishayee,et al.  Safe and effective fine-grained TCP retransmissions for datacenter communication , 2009, SIGCOMM '09.

[4]  Mike Eisler,et al.  Network File System (NFS) Version 4 Minor Version 1 Protocol , 2010, RFC.

[5]  QUTdN QeO,et al.  Random early detection gateways for congestion avoidance , 1993, TNET.

[6]  Haitao Wu,et al.  ICTCP: Incast Congestion Control for TCP in Data-Center Networks , 2013, IEEE/ACM Transactions on Networking.

[7]  David A. Maltz,et al.  Data center TCP (DCTCP) , 2010, SIGCOMM 2010.

[8]  Carey L. Williamson,et al.  An application-level solution for the TCP-incast problem in data center networks , 2011, 2011 IEEE Nineteenth IEEE International Workshop on Quality of Service.

[9]  Isaac Keslassy,et al.  A switch-based approach to throughput collapse and starvation in data centers , 2010, IWQoS.

[10]  Thomas R. Henderson,et al.  Network Simulations with the ns-3 Simulator , 2008 .

[11]  Junda Liu,et al.  Multi-enterprise networking , 2000 .

[12]  Avideh Zakhor,et al.  Receiver-driven bandwidth sharing for TCP , 2003, IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428).

[13]  Prathima Agrawal,et al.  A Probabilistic Approach to Address TCP Incast in Data Center Networks , 2011, 2011 31st International Conference on Distributed Computing Systems Workshops.

[14]  Jon Postel,et al.  Transmission Control Protocol , 1981, RFC.

[15]  Van Jacobson,et al.  TCP Extensions for High Performance , 1992, RFC.

[16]  Sally Floyd,et al.  Increasing TCP's Initial Window , 1998, RFC.

[17]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[18]  Amit Agarwal,et al.  An argument for increasing TCP's initial congestion window , 2010, CCRV.

[19]  Ming Zhang,et al.  Understanding data center traffic characteristics , 2010, CCRV.

[20]  A. L. Narasimha Reddy,et al.  Performance of Quantized Congestion Notification in TCP Incast Scenarios of Data Centers , 2010, 2010 IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[21]  Avideh Zakhor,et al.  Receiver-driven bandwidth sharing for TCP and its application to video streaming , 2005, IEEE Transactions on Multimedia.

[22]  Jim Zelenka,et al.  A cost-effective, high-bandwidth storage architecture , 1998, ASPLOS VIII.

[23]  Tao Yang,et al.  The Panasas ActiveScale Storage Cluster - Delivering Scalable High Bandwidth Storage , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[24]  Jon Postel,et al.  DOD standard transmission control protocol , 1980, CCRV.

[25]  Peng Zhang,et al.  Shrinking MTU to Mitigate TCP Incast Throughput Collapse in Data Center Networks , 2011, 2011 Third International Conference on Communications and Mobile Computing.

[26]  Srinivasan Seshan,et al.  On application-level approaches to avoiding TCP throughput collapse in cluster-based storage systems , 2007, PDSW '07.

[27]  Hari Balakrishnan,et al.  Network Working Group , 1991 .

[28]  Sally Floyd,et al.  Simulation-based comparisons of Tahoe, Reno and SACK TCP , 1996, CCRV.

[29]  Yan Zhang,et al.  On mitigating TCP Incast in Data Center Networks , 2011, 2011 Proceedings IEEE INFOCOM.

[30]  David L. Black,et al.  The Addition of Explicit Congestion Notification (ECN) to IP , 2001, RFC.

[31]  Thomas Haynes,et al.  NFS Version 4 Minor Version 2 , 2011 .

[32]  Chuang Lin,et al.  Modeling and understanding TCP incast in data center networks , 2011, 2011 Proceedings IEEE INFOCOM.