Cerberus

The bandwidth and latency requirements of modern datacenter applications have led researchers to propose various topology designs using static, dynamic demand-oblivious (rotor), and/or dynamic demand-aware switches. However, given the diverse nature of datacenter traffic, there is little consensus about how these designs would fare against each other. In this work, we analyze the throughput of existing topology designs under different traffic patterns and study their unique advantages and potential costs in terms of bandwidth and latency ''tax''. To overcome the identified inefficiencies, we propose Cerberus, a unified, two-layer leaf-spine optical datacenter design with three topology types. Cerberus systematically matches different traffic patterns with their most suitable topology type: e.g., latency-sensitive flows are transmitted via a static topology, all-to-all traffic via a rotor topology, and elephant flows via a demand-aware topology. We show analytically and in simulations that Cerberus can improve throughput significantly compared to alternative approaches and operate datacenters at higher loads while being throughput-proportional.

[1]  Madeleine Glick,et al.  SiP-ML: high-bandwidth optical network interconnects for machine learning training , 2021, SIGCOMM.

[2]  Chen Avin,et al.  On the Complexity of Traffic Traces and Implications , 2020, Abstracts of the 2020 SIGMETRICS/Performance Joint International Conference on Measurement and Modeling of Computer Systems.

[3]  Kaan Kaan Emir,et al.  Cerberus , 2020, Definitions.

[4]  Arvind Krishnamurthy,et al.  High-resolution measurement of data center microbursts , 2017, Internet Measurement Conference.

[5]  Alex C. Snoeren,et al.  RotorNet: A Scalable, Low-complexity, Optical Datacenter Network , 2017, SIGCOMM.

[6]  Gal Shahaf,et al.  Beyond fat-trees without antennae, mirrors, and disco-balls , 2017, SIGCOMM.

[7]  Ankit Singla,et al.  Fat-FREE Topologies , 2016, HotNets.

[8]  Urs Hölzle,et al.  Jupiter rising , 2016, Commun. ACM.

[9]  Nikhil R. Devanur,et al.  ProjecToR: Agile Reconfigurable Data Center Interconnect , 2016, SIGCOMM.

[10]  Alex C. Snoeren,et al.  Inside the Social Network's (Datacenter) Network , 2015, Comput. Commun. Rev..

[11]  Srinivasan Keshav,et al.  Quartz , 2014, SIGCOMM.

[12]  V. Sekar,et al.  FireFly , 2014 .

[13]  Amin Vahdat,et al.  Integrating microsecond circuit switching into the data center , 2013, SIGCOMM.

[14]  Lucian Popa,et al.  What we talk about when we talk about cloud network performance , 2012, CCRV.

[15]  Yongqiang Liu,et al.  VirtualKnotter: Online Virtual Machine Shuffling for Congestion Resolving in Virtualized Datacenter , 2012, 2012 IEEE 32nd International Conference on Distributed Computing Systems.

[16]  Konstantina Papagiannaki,et al.  c-Through: part-time optics in data centers , 2010, SIGCOMM '10.

[17]  Albert G. Greenberg,et al.  Data center TCP (DCTCP) , 2010, SIGCOMM '10.

[18]  Albert G. Greenberg,et al.  The nature of data center traffic: measurements & analysis , 2009, IMC '09.

[19]  Albert G. Greenberg,et al.  VL2: a scalable and flexible data center network , 2009, SIGCOMM '09.

[20]  Amin Vahdat,et al.  A scalable, commodity data center network architecture , 2008, SIGCOMM '08.

[21]  Nick McKeown,et al.  The iSLIP scheduling algorithm for input-queued switches , 1999, TNET.

[22]  Jeffrey S. Rosenthal,et al.  Convergence Rates for Markov Chains , 1995, SIAM Rev..

[23]  Leslie G. Valiant,et al.  A Scheme for Fast Parallel Communication , 1982, SIAM J. Comput..

[24]  Ben Y. Zhao,et al.  Mirror mirror on the ceiling: flexible wireless links for data centers , 2012, CCRV.