RepFlow: Minimizing flow completion times with replicated flows in data centers

Short TCP flows that are critical for many interactive applications in data centers are plagued by long flows and head-of-line blocking in switches. Hash-based load balancing schemes such as ECMP aggravate the matter and result in long-tailed flow completion times (FCT). Previous work on reducing FCT usually requires custom switch hardware and/or protocol changes. We propose RepFlow, a simple yet practically effective approach that replicates each short flow to reduce the completion times, without any change to switches or host kernels. With ECMP the original and replicated flows traverse distinct paths with different congestion levels, thereby reducing the probability of having long queueing delay. We develop a simple analytical model to demonstrate the potential improvement. Further, we conduct NS-3 simulations and Mininet implementation and show that RepFlow provides 50%-70% speedup in both mean and 99-th percentile FCT for all loads, and offers near-optimal FCT when used with DCTCP.

[1]  Amar Phanishayee,et al.  Safe and effective fine-grained TCP retransmissions for datacenter communication , 2009, SIGCOMM '09.

[2]  Nick McKeown,et al.  Reproducible network experiments using container-based emulation , 2012, CoNEXT '12.

[3]  Donald F. Towsley,et al.  Modeling TCP throughput: a simple model and its empirical validation , 1998, SIGCOMM '98.

[4]  Albert G. Greenberg,et al.  The nature of data center traffic: measurements & analysis , 2009, IMC '09.

[5]  R. Syski,et al.  Fundamentals of Queueing Theory , 1999, Technometrics.

[6]  Michael Mitzenmacher,et al.  The Power of Two Choices in Randomized Load Balancing , 2001, IEEE Trans. Parallel Distributed Syst..

[7]  Matthew Mathis,et al.  The macroscopic behavior of the TCP congestion avoidance algorithm , 1997, CCRV.

[8]  D. Zats,et al.  DeTail: reducing the flow completion time tail in datacenter networks , 2012, CCRV.

[9]  John S. Heidemann,et al.  Modeling the performance of HTTP over several transport protocols , 1997, TNET.

[10]  Amin Vahdat,et al.  Hedera: Dynamic Flow Scheduling for Data Center Networks , 2010, NSDI.

[11]  Brighten Godfrey,et al.  VeriFlow: verifying network-wide invariants in real time , 2012, HotSDN '12.

[12]  Jeffrey Dean,et al.  Achieving Rapid Response Times in Large Online Services , 2012 .

[13]  Christopher Stewart,et al.  Zoolander: Efficiently Meeting Very Strict, Low-Latency SLOs , 2013, ICAC.

[14]  Mark Handley,et al.  How Hard Can It Be? Designing and Implementing a Deployable Multipath TCP , 2012, NSDI.

[15]  Albert G. Greenberg,et al.  Data center TCP (DCTCP) , 2010, SIGCOMM '10.

[16]  Amin Vahdat,et al.  Less Is More: Trading a Little Bandwidth for Ultra-Low Latency in the Data Center , 2012, NSDI.

[17]  Albert G. Greenberg,et al.  VL2: a scalable and flexible data center network , 2009, SIGCOMM '09.

[18]  Brighten Godfrey,et al.  Low latency via redundancy , 2013, CoNEXT.

[19]  Stefan Savage,et al.  Modeling TCP latency , 2000, Proceedings IEEE INFOCOM 2000. Conference on Computer Communications. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies (Cat. No.00CH37064).

[20]  Amit Agarwal,et al.  An argument for increasing TCP's initial congestion window , 2010, CCRV.

[21]  Amin Vahdat,et al.  A scalable, commodity data center network architecture , 2008, SIGCOMM '08.

[22]  Carl M. Harris,et al.  Fundamentals of Queueing Theory: Gross/Fundamentals of Queueing Theory , 2008 .

[23]  Antony I. T. Rowstron,et al.  Better never than late: meeting deadlines in datacenter networks , 2011, SIGCOMM.

[24]  Brighten Godfrey,et al.  Finishing flows quickly with preemptive scheduling , 2012, CCRV.

[25]  T. N. Vijaykumar,et al.  Deadline-aware datacenter tcp (D2TCP) , 2012, SIGCOMM '12.

[26]  Mark Handley,et al.  Improving datacenter performance and robustness with multipath TCP , 2011, SIGCOMM.

[27]  Guido Appenzeller,et al.  Sizing router buffers , 2004, SIGCOMM '04.

[28]  Constantinos Dovrolis,et al.  Beyond the Model of Persistent TCP Flows: Open-Loop vs Closed-Loop Arrivals of Non-persistent Flows , 2008, 41st Annual Simulation Symposium (anss-41 2008).

[29]  Frank Kelly,et al.  Notes on effective bandwidths , 1994 .

[30]  Zhe Wu,et al.  Understanding the latency benefits of multi-cloud webservice deployments , 2013, CCRV.

[31]  Srinivasan Seshan,et al.  Measurement and Analysis of TCP Throughput Collapse in Cluster-based Storage Systems , 2008, FAST.

[32]  R. Srikant,et al.  Impact of File Arrivals and Departures on Buffer Sizing in Core Routers , 2011, IEEE/ACM Transactions on Networking.

[33]  Brighten Godfrey,et al.  More is less: reducing latency via redundancy , 2012, HotNets-XI.

[34]  Ramana Rao Kompella,et al.  On the impact of packet spraying in data center networks , 2013, 2013 Proceedings IEEE INFOCOM.

[35]  Scott Shenker,et al.  Usenix Association 10th Usenix Symposium on Networked Systems Design and Implementation (nsdi '13) 185 Effective Straggler Mitigation: Attack of the Clones , 2022 .

[36]  Nick McKeown,et al.  pFabric: minimal near-optimal datacenter transport , 2013, SIGCOMM.

[37]  Ali Munir,et al.  Minimizing flow completion times in data centers , 2013, 2013 Proceedings IEEE INFOCOM.

[38]  Louis Anthony Cox,et al.  Wiley encyclopedia of operations research and management science , 2011 .

[39]  Bert Zwart,et al.  Tails in scheduling , 2007, PERV.