Improving tcp congestion control for high bandwidth and long distance networks

Long distance networks spanning several continents are growing in importance, and many multi-national companies are now centralizing their data centers for economical reasons. While high performance of TCP in these networks is critical for the effective operation of their data centers, it is commonly reported that TCP substantially underutilizes network bandwidth in these environments. The performance problem of TCP in long distance networks can be largely attributed to three different components of TCP congestion control: (1) a slow increase of congestion window following a congestion event in congestion avoidance, (2) poor start-up throughput due to burst packet losses during slow start, and (3) high susceptibility to non-congestion related losses due to poor loss detection and recovery of standard TCP. While TCP congestion control was created to support an early, fledgling Internet, consisting of relatively low bandwidth and short distance networks, we show that TCP can embrace current technology trends of high-speed and long distance networks. Motivated by this aspiration, we explore solutions that address these challenges. This dissertation proposes three practical solutions (one solution for each problem) for improving TCP performance in high bandwidth and long distance networks. These solutions are CUBIC, HyStart, and BLAST. They address orthogonal components of TCP congestion control, and thus they can be applied separately or in conjunction with each other. Specifically, CUBIC modifies the linear window growth function of existing TCP standards to be a cubic function in order to improve the scalability of TCP and also keeps the protocol TCP friendly. HyStart is a practical slow start algorithm that conforms to underlying layers and avoids immense system overload, which frequently results in end systems unresponsive for an extended period while recovering from burst packet losses; HyStart finds a "safe" exit point where slow start can finish and safely advance to the congestion avoidance phase without causing any heavy packet loss. BLAST makes loss-based TCPs more resilient in the face of non-congestion related losses by heuristically disambiguating non-congestion related losses with high accuracy. BLAST is integrated with the loss detection and recovery path in Linux and outperforms existing loss and delay-based TCPs by an order of magnitude in throughput. The contributions of this dissertation are as follows. First, it departs from observing the performance issues regarding TCP of real end systems in high bandwidth and long distance networks. Specifically, we test TCP stacks on popular operating systems in our realistic experimental testbeds and recognize the functions of TCP that still require greater optimization. Second, we present three practical solutions that improve overall TCP performance in these environments. The proposed algorithms are designed with practical deployment in mind, and thus they are easy to understand and only require modification of the TCP sender. Finally, unlike most prior work, the proposed algorithms have been integrated into TCP stacks in Linux, and have undergone extensive testing in lab testbeds and also on the Internet. For example, CUBIC and HyStart have been integrated as a default congestion control algorithm in Linux, and BLAST will be used for providing loss resilience to Cisco's Wide Area Application Service (WAAS) WAN optimizers.