TinyFlow: Breaking elephants down into mice in data center networks

Current multipath routing solution in data centers relies on ECMP to distribute traffic among all equal-cost paths. It is well known that ECMP suffers from two deficiencies. ECMP does not differentiate between elephant and mice flows, creates head-of-line blocking for mice flows in the egress port buffer, and results in long tail latency. Further it does not fully utilize available bandwidth due to hash collision among elephant flows. We propose TinyFlow, a simple yet effective approach that remedies both problems. TinyFlow changes the traffic characteristics of data center networks to be amenable to ECMP by breaking elephants into mice. In a network with a large number of mice flows only, ECMP naturally balances load and performance is improved. We conduct NS-3 simulations and show that TinyFlow provides 20%-40% speedup in both mean and 99-th percentile FCT for mice, and about 40% throughput improvement for elephants.

[1]  Amin Vahdat,et al.  Hedera: Dynamic Flow Scheduling for Data Center Networks , 2010, NSDI.

[2]  H. Jonathan Chao,et al.  Leveraging Performance of Multiroot Data Center Networks by Reactive Reroute , 2010, 2010 18th IEEE Symposium on High Performance Interconnects.

[3]  Sally Floyd,et al.  The NewReno Modification to TCP's Fast Recovery Algorithm , 2004, RFC.

[4]  Albert G. Greenberg,et al.  VL2: a scalable and flexible data center network , 2009, SIGCOMM '09.

[5]  Zhiping Cai,et al.  Low Latency Datacenter Networking: A Short Survey , 2013, ArXiv.

[6]  Amin Vahdat,et al.  Less Is More: Trading a Little Bandwidth for Ultra-Low Latency in the Data Center , 2012, NSDI.

[7]  Nick Feamster,et al.  The road to SDN: an intellectual history of programmable networks , 2014, CCRV.

[8]  Ramana Rao Kompella,et al.  On the impact of packet spraying in data center networks , 2013, 2013 Proceedings IEEE INFOCOM.

[9]  Praveen Yalagandula,et al.  Mahout: Low-overhead datacenter traffic management using end-host-based elephant detection , 2011, 2011 Proceedings IEEE INFOCOM.

[10]  Nick McKeown,et al.  pFabric: minimal near-optimal datacenter transport , 2013, SIGCOMM.

[11]  Antony I. T. Rowstron,et al.  Better never than late: meeting deadlines in datacenter networks , 2011, SIGCOMM.

[12]  Brighten Godfrey,et al.  Finishing flows quickly with preemptive scheduling , 2012, CCRV.

[13]  Baochun Li,et al.  RepFlow: Minimizing flow completion times with replicated flows in data centers , 2013, IEEE INFOCOM 2014 - IEEE Conference on Computer Communications.

[14]  Amin Vahdat,et al.  A scalable, commodity data center network architecture , 2008, SIGCOMM '08.

[15]  Amar Phanishayee,et al.  Safe and effective fine-grained TCP retransmissions for datacenter communication , 2009, SIGCOMM '09.

[16]  Albert G. Greenberg,et al.  Data center TCP (DCTCP) , 2010, SIGCOMM '10.

[17]  Michael J. Freedman,et al.  Scalable, optimal flow routing in datacenters via local link balancing , 2013, CoNEXT.

[18]  Sujata Banerjee,et al.  DevoFlow: scaling flow management for high-performance networks , 2011, SIGCOMM.

[19]  Mark Handley,et al.  Improving datacenter performance and robustness with multipath TCP , 2011, SIGCOMM.

[20]  D. Zats,et al.  DeTail: reducing the flow completion time tail in datacenter networks , 2012, CCRV.

[21]  Ming Zhang,et al.  MicroTE: fine grained traffic engineering for data centers , 2011, CoNEXT '11.

[22]  Haitao Wu,et al.  Per-packet load-balanced, low-latency routing for clos-based data center networks , 2013, CoNEXT.