REWIRE: An Optimization-based Framework for Data Center Network Design

Despite the many proposals for data center network (DCN) architectures, designing a DCN remains challenging. DCN design is especially difficult when expanding an existing network, because traditional DCN design places strict constraints on the topology (e.g., a fat-tree). Recent advances in routing protocols allow data center servers to fully utilize arbitrary networks, so there is no need to require restricted, regular topologies in the data center. Therefore, we propose a data center network design framework, REWIRE, that designs networks using a local search-based algorithm. Our algorithm finds a network with maximal bisection bandwidth and minimal end-to-end latency while meeting user-defined constraints and accurately modeling the predicted cost of the network. We evaluate REWIRE on a wide range of inputs and find that it significantly outperforms previous solutions—its network designs have up to 100-500% more bisection bandwidth and less end-to-end network latency than best-practice data center networks.

[1]  V. Benes,et al.  Mathematical Theory of Connecting Networks and Telephone Traffic. , 1966 .

[2]  Walter Willinger,et al.  Network topology generators: degree-based vs. structural , 2002, SIGCOMM '02.

[3]  Charles E. Leiserson,et al.  Fat-trees: Universal networks for hardware-efficient supercomputing , 1985, IEEE Transactions on Computers.

[4]  Amin Vahdat,et al.  Hedera: Dynamic Flow Scheduling for Data Center Networks , 2010, NSDI.

[5]  Ronald L. Rivest,et al.  Introduction to Algorithms, third edition , 2009 .

[6]  Peter Rossmanith,et al.  Simulated Annealing , 2008, Taschenbuch der Algorithmen.

[7]  M. Kodialam,et al.  Oblivious Routing of Highly Variable Traffic in Service Overlays and IP Backbones , 2009, IEEE/ACM Transactions on Networking.

[8]  Haitao Wu,et al.  BCube: a high performance, server-centric network architecture for modular data centers , 2009, SIGCOMM '09.

[9]  William J. Dally,et al.  Flattened butterfly: a cost-efficient topology for high-radix networks , 2007, ISCA '07.

[10]  Albert G. Greenberg,et al.  The nature of data center traffic: measurements & analysis , 2009, IMC '09.

[11]  Praveen Yalagandula,et al.  Mahout: Low-overhead datacenter traffic management using end-host-based elephant detection , 2011, 2011 Proceedings IEEE INFOCOM.

[12]  Alejandro López-Ortiz,et al.  Capacity Provisioning a Valiant Load-Balanced Network , 2009, IEEE INFOCOM 2009.

[13]  Yuval Peres,et al.  All-Pairs Shortest Paths in O(n2) Time with High Probability , 2010, FOCS.

[14]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[15]  Charles Clos,et al.  A study of non-blocking switching networks , 1953 .

[16]  Franz Franchetti,et al.  Program generation for the all-pairs shortest path problem , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[17]  John R. Gilbert,et al.  Solving path problems on the GPU , 2010, Parallel Comput..

[18]  Max Crochemore,et al.  Algorithms and Theory of Computation Handbook , 2010 .

[19]  Jennifer Rexford,et al.  Floodless in seattle: a scalable ethernet architecture for large enterprises , 2008, SIGCOMM '08.

[20]  Jeffrey C. Mogul,et al.  Taming the Flying Cable Monster: A Topology Design and Optimization Framework for Data-Center Networks , 2011, USENIX ATC.

[21]  W. Daniel Hillis,et al.  Data parallel algorithms , 1986, CACM.

[22]  Ankit Singla,et al.  Jellyfish: Networking Data Centers Randomly , 2011, NSDI.

[23]  Luiz André Barroso,et al.  The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines , 2009, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines.

[24]  Albert G. Greenberg,et al.  VL2: a scalable and flexible data center network , 2009, SIGCOMM '09.

[25]  Yaghout Nourani,et al.  A comparison of simulated annealing cooling strategies , 1998 .

[26]  Alejandro López-Ortiz,et al.  LEGUP: using heterogeneity to reduce the cost of data center network upgrades , 2010, CoNEXT.

[27]  M. A. Muñoz,et al.  Optimal network topologies: expanders, cages, Ramanujan graphs, entangled networks and all that , 2006, cond-mat/0605565.

[28]  Mark Handley,et al.  Improving datacenter performance and robustness with multipath TCP , 2011, SIGCOMM.

[29]  Yuan Yu,et al.  Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[30]  Mikhail J. Atallah,et al.  Algorithms and Theory of Computation Handbook , 2009, Chapman & Hall/CRC Applied Algorithms and Data Structures series.

[31]  Haitao Wu,et al.  MDCube: a high performance network structure for modular data center interconnection , 2009, CoNEXT '09.

[32]  Ming Zhang,et al.  Understanding data center traffic characteristics , 2010, CCRV.

[33]  U. Feige,et al.  Spectral Graph Theory , 2015 .

[34]  David A. Maltz,et al.  DCTCP: Efficient Packet Transport for the Commoditized Data Center , 2010 .

[35]  I. Stoica,et al.  A Cost Comparison of Data Center Network Architectures , 2010 .

[36]  Dhiraj K. Pradhan,et al.  The De Bruijn Multiprocessor Network: A Versatile Parallel Processing and Sorting Network for VLSI , 1989, IEEE Trans. Computers.

[37]  Jung Ho Ahn,et al.  HyperX: topology, routing, and packaging of efficient large-scale networks , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[38]  Albert G. Greenberg,et al.  The cost of a cloud: research problems in data center networks , 2008, CCRV.

[39]  Jing Yuan,et al.  Generic and automatic address configuration for data center networks , 2010, SIGCOMM '10.

[40]  Kathryn A. Dowsland,et al.  Simulated Annealing , 1989, Encyclopedia of GIS.

[41]  Lei Shi,et al.  Dcell: a scalable and fault-tolerant network structure for data centers , 2008, SIGCOMM '08.

[42]  David A. Maltz,et al.  Network traffic characteristics of data centers in the wild , 2010, IMC '10.

[43]  Amin Vahdat,et al.  PortLand: a scalable fault-tolerant layer 2 data center network fabric , 2009, SIGCOMM '09.

[44]  Joseph D. Touch,et al.  Transparent interconnection of lots of links (TRILL): problem and applicability statement , 2022 .

[45]  Jeffrey C. Mogul,et al.  SPAIN: COTS Data-Center Ethernet for Multipathing over Arbitrary Topologies , 2010, NSDI.

[46]  Ion Stoica,et al.  A cost comparison of datacenter network architectures , 2010, CoNEXT.

[47]  A. Mullin,et al.  Mathematical Theory of Connecting Networks and Telephone Traffic. , 1966 .