Xpander: Unveiling the Secrets of High-Performance Datacenters

Many architectures for high-performance datacenters have been proposed. Surprisingly, recent studies show that datacenter designs with random network topologies outperform more sophisticated designs, achieving near-optimal throughput and bisection bandwidth, high resiliency to failures, incremental expandability, high cost efficiency, and more. Unfortunately, the inherent unstructuredness and unpredictability of random designs pose serious, arguably insurmountable, obstacles to their adoption in practice. Can these guarantees be achieved by well-structured, deterministic datacenters? We provide a surprising affirmative answer. We show, through a combination of theoretical analyses, extensive simulations, and experiments with a network emulator, that any "expander" network topology (as indeed are random graphs) comes with these benefits. We leverage this insight to present Xpander, a novel deterministic datacenter architecture that achieves all of the above desiderata while providing a tangible alternative to existing datacenter designs. We discuss challenges en route to deploying Xpander (including physical layout, cabling costs and complexity, backwards compatibility) and explain how these can be resolved.

[1]  Haitao Wu,et al.  BCube: a high performance, server-centric network architecture for modular data centers , 2009, SIGCOMM '09.

[2]  Ankit Singla,et al.  High Throughput Data Center Topology Design , 2013, NSDI.

[3]  Howard Jay Siegel,et al.  The Extra Stage Cube: A Fault-Tolerant Interconnection Network for Supersystems , 1982, IEEE Transactions on Computers.

[4]  Amin Vahdat,et al.  PortLand: a scalable fault-tolerant layer 2 data center network fabric , 2009, SIGCOMM '09.

[5]  Mark Handley,et al.  Design, Implementation and Evaluation of Congestion Control for Multipath TCP , 2011, NSDI.

[6]  Navendu Jain,et al.  Understanding network failures in data centers: measurement, analysis, and implications , 2011, SIGCOMM.

[7]  N. Linial,et al.  DISCREPANCY AND NEARLY OPTIMAL SPECTRAL GAP * , .

[8]  Eli Upfal,et al.  An O(log N) deterministic packet-routing scheme , 1992, JACM.

[9]  S H Lee,et al.  Parallel algorithms based on expander graphs for optical computing. , 1991, Applied optics.

[10]  Torsten Hoefler,et al.  Slim Fly: A Cost Effective Low-Diameter Network Topology , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[11]  Nick McKeown,et al.  OpenFlow: enabling innovation in campus networks , 2008, CCRV.

[12]  Thomas E. Anderson,et al.  F10: A Fault-Tolerant Engineered Network , 2013, NSDI.

[13]  Eric C. Rosen,et al.  Multiprotocol Label Switching Architecture" RFC 3031 , 2001 .

[14]  Emin Gün Sirer,et al.  Small-world datacenters , 2011, SoCC.

[15]  Jeffrey C. Mogul,et al.  SPAIN: COTS Data-Center Ethernet for Multipathing over Arbitrary Topologies , 2010, NSDI.

[16]  Eric A. Brewer,et al.  Scalable expanders: exploiting hierarchical random wiring , 1994, STOC '94.

[17]  Sangeetha Abdu Jyothi,et al.  Measuring and Understanding Throughput of Network Topologies , 2014, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.

[18]  N. Linial,et al.  Lifts, Discrepancy and Nearly Optimal Spectral Gaps , 2003 .

[19]  Marta M. B. Pascoal,et al.  A new implementation of Yen’s ranking loopless paths algorithm , 2003, 4OR.

[20]  Lei Shi,et al.  Dcell: a scalable and fault-tolerant network structure for data centers , 2008, SIGCOMM '08.

[21]  M. Murty Ramanujan Graphs , 1965 .

[22]  Nathan Linial,et al.  Lifts, Discrepancy and Nearly Optimal Spectral Gap* , 2006, Comb..

[23]  Eric C. Rosen,et al.  Multiprotocol Label Switching Architecture , 2001, RFC.

[24]  J. Y. Yen Finding the K Shortest Loopless Paths in a Network , 1971 .

[25]  Eric A. Brewer,et al.  Building a better butterfly: the multiplexed metabutterfly , 1994, Proceedings of the International Symposium on Parallel Architectures, Algorithms and Networks (ISPAN).

[26]  Ankit Singla,et al.  Jellyfish: Networking Data Centers Randomly , 2011, NSDI.

[27]  Amin Vahdat,et al.  A scalable, commodity data center network architecture , 2008, SIGCOMM '08.

[28]  Joel Friedman,et al.  A proof of Alon's second eigenvalue conjecture and related problems , 2004, ArXiv.

[29]  Ion Stoica,et al.  A cost comparison of datacenter network architectures , 2010, CoNEXT.

[30]  Béla Bollobás,et al.  The Isoperimetric Number of Random Regular Graphs , 1988, Eur. J. Comb..

[31]  Albert G. Greenberg,et al.  VL2: a scalable and flexible data center network , 2009, SIGCOMM '09.

[32]  Alejandro López-Ortiz,et al.  LEGUP: using heterogeneity to reduce the cost of data center network upgrades , 2010, CoNEXT.

[33]  N. Linial,et al.  Expander Graphs and their Applications , 2006 .

[34]  Haitao Wu,et al.  MDCube: a high performance network structure for modular data center interconnection , 2009, CoNEXT '09.

[35]  J. Y. Yen,et al.  Finding the K Shortest Loopless Paths in a Network , 2007 .

[36]  Nick McKeown,et al.  A network in a laptop: rapid prototyping for software-defined networks , 2010, Hotnets-IX.

[37]  Sangeetha Abdu Jyothi,et al.  Measuring throughput of data center network topologies , 2014, SIGMETRICS '14.

[38]  Amin Vahdat,et al.  Hedera: Dynamic Flow Scheduling for Data Center Networks , 2010, NSDI.