Deadlock-free local fast failover for arbitrary data center networks

Today, given data center networks' sizes and bursty workloads, it is likely that at any moment there is packet loss due to some type of failure in the network. This paper focuses on solving the two most common types of data center network failures: congestion and routing failures. Recently, there has been demand for lossless Ethernet (DCB) in data center networks as a solution to congestion failures. However, DCB complicates fault tolerance by introducing a new type of failure, deadlock. If DCB is enabled, then all routing must be deadlock free. To the best of our knowledge, this paper describes the first ever deadlock-free approaches to local fast failover that can be combined with DCB, DF-FI and DF-EDST resilience. Moreover, in the evaluation, this paper shows that DF-EDST resilience, which is the paper's main contribution, can improve fault tolerance without adversely impacting performance when compared to a state-of-the-art approach to deadlock-free routing. If, however, a small reduction in aggregate throughput is acceptable, then it is possible to build routes such that only 0.00001% of the total flows in the network are likely to fail given 16 edge failures on networks with 1K-4K hosts.

[1]  Amin Vahdat,et al.  Less Is More: Trading a Little Bandwidth for Ultra-Low Latency in the Data Center , 2012, NSDI.

[2]  Torsten Hoefler,et al.  Deadlock-Free Oblivious Routing for Arbitrary Topologies , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[3]  Navendu Jain,et al.  Understanding network failures in data centers: measurement, analysis, and implications , 2011, SIGCOMM.

[4]  Miguel Castro,et al.  FaRM: Fast Remote Memory , 2014, NSDI.

[5]  Mohan Kumar,et al.  On generalized fat trees , 1995, Proceedings of 9th International Parallel Processing Symposium.

[6]  Scott Shenker,et al.  Achieving convergence-free routing using failure-carrying packets , 2007, SIGCOMM '07.

[7]  DEADLOCK PREVENTION IN DIRECT NETWORKS OF ARBITRARY TOPOLOGY CROSS-REFERENCE TO RELATED APPLICATIONS , .

[8]  David A. Maltz,et al.  DCTCP: Efficient Packet Transport for the Commoditized Data Center , 2010 .

[9]  Bruce M. Maggs,et al.  R-BGP: Staying Connected in a Connected World , 2007, NSDI.

[10]  Nick McKeown,et al.  OpenFlow: enabling innovation in campus networks , 2008, CCRV.

[11]  Thomas E. Anderson,et al.  F10: A Fault-Tolerant Engineered Network , 2013, NSDI.

[12]  Srinivasan Seshan,et al.  Measurement and Analysis of TCP Throughput Collapse in Cluster-based Storage Systems , 2008, FAST.

[13]  DuanZhenhai,et al.  Oblivious routing for fat-tree based system area networks with uncertain traffic demands , 2007 .

[14]  Junda Liu,et al.  Keep Forwarding: Towards k-link failure resilient routing , 2014, IEEE INFOCOM 2014 - IEEE Conference on Computer Communications.

[15]  Marco Canini,et al.  FatTire: declarative fault tolerance for software-defined networks , 2013, HotSDN '13.

[16]  Ming Zhang,et al.  Congestion Control for Large-Scale RDMA Deployments , 2015, Comput. Commun. Rev..

[17]  Joan Feigenbaum,et al.  On the Resilience of Routing Tables , 2012, ArXiv.

[18]  Jeffrey C. Mogul,et al.  NetLord: a scalable multi-tenant network architecture for virtualized datacenters , 2011, SIGCOMM.

[19]  Antonio Robles,et al.  A Survey and Evaluation of Topology-Agnostic Deterministic Routing Algorithms , 2012, IEEE Transactions on Parallel and Distributed Systems.

[20]  Moti Yung,et al.  Convergence routing on disjoint spanning trees , 1999, Comput. Networks.

[21]  Olivier Bonaventure,et al.  An evaluation of IP-based fast reroute techniques , 2005, CoNEXT '05.

[22]  William J. Dally,et al.  Deadlock-Free Message Routing in Multiprocessor Interconnection Networks , 1987, IEEE Transactions on Computers.

[23]  Sujata Banerjee,et al.  DevoFlow: scaling flow management for high-performance networks , 2011, SIGCOMM 2011.

[24]  Alan L. Cox,et al.  Scalable Multi-Failure Fast Failover via Forwarding Table Compression , 2016, SOSR.

[25]  Abishek Gopalan,et al.  IP Fast Rerouting for Multi-Link Failures , 2016, IEEE/ACM Transactions on Networking.

[26]  Alan L. Cox,et al.  Plinko: building provably resilient forwarding tables , 2013, HotNets.

[27]  Miguel Rio,et al.  Packet re-cycling: eliminating packet losses due to network failures , 2010, Hotnets-IX.

[28]  Michael Burrows,et al.  Autonet: A High-Speed, Self-Configuring Local Area Network Using Point-to-Point Links , 1991, IEEE J. Sel. Areas Commun..

[29]  Sujata Banerjee,et al.  DevoFlow: scaling flow management for high-performance networks , 2011, SIGCOMM.

[30]  Ankit Singla,et al.  Practical DCB for improved data center networks , 2014, IEEE INFOCOM 2014 - IEEE Conference on Computer Communications.

[31]  Ankit Singla,et al.  Jellyfish: Networking Data Centers Randomly , 2011, NSDI.

[32]  Rami G. Melhem,et al.  Oblivious Routing in Fat-Tree Based System Area Networks With Uncertain Traffic Demands , 2007, IEEE/ACM Transactions on Networking.

[33]  Junda Liu,et al.  Ensuring connectivity via data plane mechanisms , 2013, NSDI 2013.

[34]  Alia Atlas,et al.  Fast Reroute Extensions to RSVP-TE for LSP Tunnels , 2005, RFC.

[35]  Albert G. Greenberg,et al.  The nature of data center traffic: measurements & analysis , 2009, IMC '09.

[36]  D. Zats,et al.  DeTail: reducing the flow completion time tail in datacenter networks , 2012, CCRV.

[37]  Amin Vahdat,et al.  TIMELY: RTT-based Congestion Control for the Datacenter , 2015, Comput. Commun. Rev..

[38]  David G. Andersen,et al.  Using RDMA efficiently for key-value services , 2015, SIGCOMM 2015.

[39]  George Varghese,et al.  Forwarding metamorphosis: fast programmable match-action processing in hardware for SDN , 2013, SIGCOMM.

[40]  Stefan Schmid,et al.  Provable data plane connectivity with local fast failover: introducing openflow graph algorithms , 2014, HotSDN.

[41]  Junda Liu,et al.  Multi-enterprise networking , 2000 .

[42]  Olav Lysne,et al.  Layered shortest path (LASH) routing in irregular system area networks , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.