DumbNet: a smart data center network fabric with dumb switches

Today's data center networks have already pushed many functions to hosts. A fundamental question is how to divide functions between network and software. We present DumbNet, a new data center network architecture with no state in switches. DumbNet switches have no forwarding tables, no state, and thus require no configurations. Almost all control plane functions are pushed to hosts: they determine the entire path of a packet and then write the path as tags in the packet header. Switches only need to examine the tags to forward packets and monitor the port state. We design a set of host-based mechanisms to make the new architecture viable, from network bootstrapping and topology maintenance to network routing and failure handling. We build a prototype with 7 switches and 27 servers, as well as an FPGA-based switch. Extensive evaluations show that DumbNet achieves performance comparable to traditional networks, supports application-specific extensions like flowlet-based traffic engineering, and stays extremely simple and easy-to-manage.

[1]  Ross W. Callon,et al.  Use of OSI IS-IS for routing in TCP/IP and dual environments , 1990, RFC.

[2]  Nick McKeown,et al.  OpenFlow: enabling innovation in campus networks , 2008, CCRV.

[3]  Gal Shahaf,et al.  Beyond fat-trees without antennae, mirrors, and disco-balls , 2017, SIGCOMM.

[4]  Helen J. Wang,et al.  SecondNet: a data center network virtualization architecture with bandwidth guarantees , 2010, CoNEXT.

[5]  Eric A. Brewer,et al.  How to get good performance from the CM-5 data network , 1994, Proceedings of 8th International Parallel Processing Symposium.

[6]  Devavrat Shah,et al.  Fastpass: a centralized "zero-queue" datacenter network , 2015, SIGCOMM.

[7]  Xin Jin,et al.  Dynamic scheduling of network updates , 2014 .

[8]  Jeffrey C. Mogul,et al.  SPAIN: COTS Data-Center Ethernet for Multipathing over Arbitrary Topologies , 2010, NSDI.

[9]  Yakov Rekhter,et al.  A Border Gateway Protocol 4 (BGP-4) , 1994, RFC.

[10]  Amin Vahdat,et al.  PortLand: a scalable fault-tolerant layer 2 data center network fabric , 2009, SIGCOMM '09.

[11]  Gautam Kumar,et al.  pHost: distributed near-optimal datacenter transport over commodity network fabric , 2015, CoNEXT.

[12]  Robert D. Nowak,et al.  Maximum likelihood network topology identification from edge-based unicast measurements , 2002, SIGMETRICS '02.

[13]  Mark Handley,et al.  Re-architecting datacenter networks and stacks for low latency and high performance , 2017, SIGCOMM.

[14]  Christian Esteve Rothenberg,et al.  SlickFlow: Resilient source routing in Data Center Networks unlocked by OpenFlow , 2013, 38th Annual IEEE Conference on Local Computer Networks.

[15]  Nick Feamster,et al.  Practical issues with using network tomography for fault diagnosis , 2008, CCRV.

[16]  John Moy,et al.  OSPF Version 2 , 1998, RFC.

[17]  Renata Teixeira,et al.  NetDiagnoser: troubleshooting network unreachabilities using end-to-end probes and routing data , 2007, CoNEXT '07.

[18]  Albert G. Greenberg,et al.  VL2: a scalable and flexible data center network , 2009, SIGCOMM '09.

[19]  Mahadev Konar,et al.  ZooKeeper: Wait-free Coordination for Internet-scale Systems , 2010, USENIX Annual Technical Conference.

[20]  Antony I. T. Rowstron,et al.  Symbiotic routing in future data centers , 2010, SIGCOMM '10.

[21]  Lei Shi,et al.  Dcell: a scalable and fault-tolerant network structure for data centers , 2008, SIGCOMM '08.

[22]  Mo Dong,et al.  Towards a flexible data center fabric with source routing , 2015, SOSR.

[23]  J. Rexford,et al.  Network architecture for joint failure recovery and traffic engineering , 2011, PERV.

[24]  Nick McKeown,et al.  A network in a laptop: rapid prototyping for software-defined networks , 2010, Hotnets-IX.

[25]  Susan Hares,et al.  A Border Gateway Protocol 4 (BGP-4) , 1994, RFC.

[26]  Xin Jin,et al.  Your Data Center Switch is Trying Too Hard , 2016, SOSR.

[27]  Srikanth Kandula,et al.  Dynamic load balancing without packet reordering , 2007, CCRV.

[28]  Martín Casado,et al.  Dynamic route recomputation considered harmful , 2010, CCRV.

[29]  Deepak Bansal,et al.  Hierarchical SDN for the hyper-scale, hyper-elastic data center and cloud , 2015, SOSR.

[30]  Jie Huang,et al.  The HiBench benchmark suite: Characterization of the MapReduce-based data analysis , 2010, 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010).

[31]  Guido Appenzeller,et al.  Implementing an OpenFlow switch on the NetFPGA platform , 2008, ANCS '08.