Practical DCB for improved data center networks

Storage area networking is driving commodity data center switches to support lossless Ethernet (DCB). Unfortunately, to enable DCB for all traffic on arbitrary network topologies, we must address several problems that can arise in lossless networks, e.g., large buffering delays, unfairness, head of line blocking, and deadlock. We propose TCP-Bolt, a TCP variant that not only addresses the first three problems but reduces flow completion times by as much as 70%. We also introduce a simple, practical deadlock-free routing scheme that eliminates deadlock while achieving aggregate network throughput within 15% of ECMP routing. This small compromise in potential routing capacity is well worth the gains in flow completion time. We note that our results on deadlock-free routing are also of independent interest to the storage area networking community. Further, as our hardware testbed illustrates, these gains are achievable today, without hardware changes to switches or NICs.

[1]  Srinivasan Seshan,et al.  Measurement and Analysis of TCP Throughput Collapse in Cluster-based Storage Systems , 2008, FAST.

[2]  Jeffrey C. Mogul,et al.  NetLord: a scalable multi-tenant network architecture for virtualized datacenters , 2011, SIGCOMM.

[3]  Mark Handley,et al.  Design, Implementation and Evaluation of Congestion Control for Multipath TCP , 2011, NSDI.

[4]  Ankit Singla,et al.  Jellyfish: Networking Data Centers Randomly , 2011, NSDI.

[5]  Ali Munir,et al.  Minimizing flow completion times in data centers , 2013, 2013 Proceedings IEEE INFOCOM.

[6]  Amar Phanishayee,et al.  Safe and effective fine-grained TCP retransmissions for datacenter communication , 2009, SIGCOMM '09.

[7]  Nick McKeown,et al.  Deconstructing datacenter packet transport , 2012, HotNets-XI.

[8]  Edgar M. Palmer,et al.  On the spanning tree packing number of a graph: a survey , 2001, Discret. Math..

[9]  Antonio Robles,et al.  A Survey and Evaluation of Topology-Agnostic Deterministic Routing Algorithms , 2012, IEEE Transactions on Parallel and Distributed Systems.

[10]  Albert G. Greenberg,et al.  VL2: a scalable and flexible data center network , 2009, SIGCOMM '09.

[11]  Antony I. T. Rowstron,et al.  Better never than late: meeting deadlines in datacenter networks , 2011, SIGCOMM.

[12]  Brighten Godfrey,et al.  Finishing flows quickly with preemptive scheduling , 2012, CCRV.

[13]  David A. Maltz,et al.  DCTCP: Efficient Packet Transport for the Commoditized Data Center , 2010 .

[14]  T. N. Vijaykumar,et al.  Deadline-aware datacenter tcp (D2TCP) , 2012, SIGCOMM '12.

[15]  Robert Birke,et al.  Got loss? Get zOVN! , 2013, SIGCOMM.

[16]  D. Zats,et al.  DeTail: reducing the flow completion time tail in datacenter networks , 2012, CCRV.

[17]  David L. Black,et al.  The Addition of Explicit Congestion Notification (ECN) to IP , 2001, RFC.

[18]  Rong Pan,et al.  Data center transport mechanisms: Congestion control theory and IEEE standardization , 2008, 2008 46th Annual Allerton Conference on Communication, Control, and Computing.

[19]  Hari Balakrishnan,et al.  TCP ex machina: computer-generated congestion control , 2013, SIGCOMM.

[20]  Alan L. Cox,et al.  PAST: scalable ethernet for data centers , 2012, CoNEXT '12.

[21]  Amin Vahdat,et al.  Less Is More: Trading a Little Bandwidth for Ultra-Low Latency in the Data Center , 2012, NSDI.

[22]  Amin Vahdat,et al.  Practical TDMA for datacenter ethernet , 2012, EuroSys '12.

[23]  Ramana Rao Kompella,et al.  The TCP Outcast Problem: Exposing Unfairness in Data Center Networks , 2012, NSDI.