A Survey on Data Center Networking (DCN): Infrastructure and Operations

Data centers (DCs), owing to the exponential growth of Internet services, have emerged as an irreplaceable and crucial infrastructure to power this ever-growing trend. A DC typically houses a large number of computing and storage nodes, interconnected by a specially designed network, namely, DC network (DCN). The DCN serves as a communication backbone and plays a pivotal role in optimizing DC operations. However, compared to the traditional network, the unique requirements in the DCN, for example, large scale, vast application diversity, high power density, and high reliability, pose significant challenges to its infrastructure and operations. We have observed from the premium publication venues (e.g., journals and system conferences) that increasing research efforts are being devoted to optimize the design and operations of the DCN. In this paper, we aim to present a systematic taxonomy and survey of recent research efforts on the DCN. Specifically, we propose to classify these research efforts into two areas: 1) DCN infrastructure and 2) DCN operations. For the former aspect, we review and compare the list of transmission technologies and network topologies used or proposed in the DCN infrastructure. For the latter aspect, we summarize the existing traffic control techniques in the DCN operations, and survey optimization methods to achieve diverse operational objectives, including high network utilization, fair bandwidth sharing, low service latency, low energy consumption, high resiliency, and etc., for efficient DC operations. We finally conclude this survey by envisioning a few open research opportunities in DCN infrastructure and operations.

[1]  R. W. Tkach,et al.  Free-space micromachined optical switches for optical networking , 1999 .

[2]  Albert-László Barabási,et al.  Internet: Diameter of the World-Wide Web , 1999, Nature.

[3]  V. Aksyuk,et al.  Wavelength add-drop switching using tilting micromirrors , 1999 .

[4]  Christian E. Hopps,et al.  Analysis of an Equal-Cost Multi-Path Algorithm , 2000, RFC.

[5]  Peter Phaal,et al.  InMon Corporation's sFlow: A Method for Monitoring Traffic in Switched and Routed Networks , 2001, RFC.

[6]  David L. Black,et al.  The Addition of Explicit Congestion Notification (ECN) to IP , 2001, RFC.

[7]  Xiaohua Ma,et al.  Optical switching technology comparison: optical MEMS vs. other technologies , 2003, IEEE Commun. Mag..

[8]  Nick McKeown,et al.  Designing a Predictable Internet Backbone with Valiant Load-Balancing , 2005, IWQoS.

[9]  Chita R. Das,et al.  Characterizing Network Traffic in a Cluster-based, Multi-tier Data Center , 2007, 27th International Conference on Distributed Computing Systems (ICDCS '07).

[10]  Albert G. Greenberg,et al.  The cost of a cloud: research problems in data center networks , 2008, CCRV.

[11]  Kenneth J. Christensen,et al.  Reducing the Energy Consumption of Ethernet with Adaptive Link Rate (ALR) , 2008, IEEE Transactions on Computers.

[12]  Srinivasan Seshan,et al.  Measurement and Analysis of TCP Throughput Collapse in Cluster-based Storage Systems , 2008, FAST.

[13]  Amin Vahdat,et al.  A scalable, commodity data center network architecture , 2008, SIGCOMM '08.

[14]  Lei Shi,et al.  Dcell: a scalable and fault-tolerant network structure for data centers , 2008, SIGCOMM '08.

[15]  Sergiu Nedevschi,et al.  Reducing Network Energy Consumption via Sleeping and Rate-Adaptation , 2008, NSDI.

[16]  Yoav Tock,et al.  Dr. Multicast: Rx for data center communication scalability , 2008, LADIS '08.

[17]  Albert G. Greenberg,et al.  Towards a next generation data center architecture: scalability and commoditization , 2008, PRESTO '08.

[18]  Richard E. Brown,et al.  Report to Congress on Server and Data Center Energy Efficiency: Public Law 109-431 , 2008 .

[19]  Haitao Wu,et al.  FiConn: Using Backup Port for Server Interconnection in Data Centers , 2009, IEEE INFOCOM 2009.

[20]  Junda Liu,et al.  Multi-enterprise networking , 2000 .

[21]  Amar Phanishayee,et al.  Safe and effective fine-grained TCP retransmissions for datacenter communication , 2009, SIGCOMM '09.

[22]  Albert G. Greenberg,et al.  The nature of data center traffic: measurements & analysis , 2009, IMC '09.

[23]  Amin Vahdat,et al.  Data Center Switch Architecture in the Age of Merchant Silicon , 2009, 2009 17th IEEE Symposium on High Performance Interconnects.

[24]  Fang Hao,et al.  Enhancing dynamic cloud-based services using network virtualization , 2009, CCRV.

[25]  Haitao Wu,et al.  MDCube: a high performance network structure for modular data center interconnection , 2009, CoNEXT '09.

[26]  Bruce M. Maggs,et al.  Cutting the electric bill for internet-scale systems , 2009, SIGCOMM '09.

[27]  Paramvir Bahl,et al.  Flyways To De-Congest Data Center Networks , 2009, HotNets.

[28]  Haitao Wu,et al.  BCube: a high performance, server-centric network architecture for modular data centers , 2009, SIGCOMM '09.

[29]  Bin Liu,et al.  GreenTE: Power-aware traffic engineering , 2010, The 18th IEEE International Conference on Network Protocols.

[30]  Amin Vahdat,et al.  Hedera: Dynamic Flow Scheduling for Data Center Networks , 2010, NSDI.

[31]  Helen J. Wang,et al.  SecondNet: a data center network virtualization architecture with bandwidth guarantees , 2010, CoNEXT.

[32]  Ion Stoica,et al.  A cost comparison of datacenter network architectures , 2010, CoNEXT.

[33]  Konstantina Papagiannaki,et al.  c-Through: part-time optics in data centers , 2010, SIGCOMM 2010.

[34]  László Gyarmati,et al.  Scafida: a scale-free network inspired data center architecture , 2010, CCRV.

[35]  Hong Liu,et al.  Energy proportional datacenter networks , 2010, ISCA.

[36]  Amin Vahdat,et al.  Helios: a hybrid electrical/optical switch architecture for modular data centers , 2010, SIGCOMM '10.

[37]  David A. Maltz,et al.  Data center TCP (DCTCP) , 2010, SIGCOMM 2010.

[38]  Sujata Banerjee,et al.  ElasticTree: Saving Energy in Data Center Networks , 2010, NSDI.

[39]  Alejandro López-Ortiz,et al.  LEGUP: using heterogeneity to reduce the cost of data center network upgrades , 2010, CoNEXT.

[40]  Jeffrey C. Mogul,et al.  SPAIN: COTS Data-Center Ethernet for Multipathing over Arbitrary Topologies , 2010, NSDI.

[41]  David A. Maltz,et al.  Network traffic characteristics of data centers in the wild , 2010, IMC '10.

[42]  Ming Zhang,et al.  Understanding data center traffic characteristics , 2010, CCRV.

[43]  Xiaoming Fu,et al.  Building mega data center from heterogeneous containers , 2011, 2011 19th IEEE International Conference on Network Protocols.

[44]  Sujata Banerjee,et al.  On energy efficiency for enterprise and data center networks , 2011, IEEE Communications Magazine.

[45]  Ion Stoica,et al.  FairCloud: sharing the network in cloud computing , 2011, SIGCOMM '12.

[46]  Better never than late: meeting deadlines in datacenter networks , 2011, SIGCOMM.

[47]  VL2: a scalable and flexible data center network , 2011, Commun. ACM.

[48]  Augmenting data center networks with multi-gigabit wireless links , 2011, SIGCOMM.

[49]  Navendu Jain,et al.  Understanding network failures in data centers: measurement, analysis, and implications , 2011, SIGCOMM.

[50]  M. Handley,et al.  Improving datacenter performance and robustness with multipath TCP , 2011, SIGCOMM.

[51]  Albert G. Greenberg,et al.  Sharing the Data Center Network , 2011, NSDI.

[52]  Lei Yang,et al.  3D beamforming for wireless data centers , 2011, HotNets-X.

[53]  Towards predictable datacenter networks , 2011, SIGCOMM.

[54]  Dorgival O. Guedes,et al.  Gatekeeper: Supporting Bandwidth Guarantees for Multi-tenant Datacenter Networks , 2011, WIOV.

[55]  Mark Handley,et al.  Design, Implementation and Evaluation of Congestion Control for Multipath TCP , 2011, NSDI.

[56]  Ioannis Tomkos,et al.  A Survey on Optical Interconnects for Data Centers , 2012, IEEE Communications Surveys & Tutorials.

[57]  Alan L. Cox,et al.  PAST: scalable ethernet for data centers , 2012, CoNEXT '12.

[58]  Amin Vahdat,et al.  Hunting mice with microsecond circuit switches , 2012, HotNets-XI.

[59]  Di Xie,et al.  The only constant is change: incorporating time-varying network reservations in data centers , 2012, CCRV.

[60]  Haitao Wu,et al.  Tuning ECN for data center networks , 2012, CoNEXT '12.

[61]  Xin Wu,et al.  NetPilot: automating datacenter network failure mitigation , 2012, SIGCOMM '12.

[62]  Dan Li,et al.  ESM: Efficient and Scalable Data Center Multicast Routing , 2012, IEEE/ACM Transactions on Networking.

[63]  Minghua Chen,et al.  Joint VM placement and routing for data center traffic engineering , 2012, 2012 Proceedings IEEE INFOCOM.

[64]  Ankit Singla,et al.  Jellyfish: Networking Data Centers Randomly , 2011, NSDI.

[65]  Amin Vahdat,et al.  Less Is More: Trading a Little Bandwidth for Ultra-Low Latency in the Data Center , 2012, NSDI.

[66]  Ben Y. Zhao,et al.  Mirror mirror on the ceiling: flexible wireless links for data centers , 2012, SIGCOMM.

[67]  Brighten Godfrey,et al.  Finishing flows quickly with preemptive scheduling , 2012, CCRV.

[68]  Albert Y. Zomaya,et al.  A distributed energy saving approach for Ethernet switches in data centers , 2012, 37th Annual IEEE Conference on Local Computer Networks.

[69]  David Mazières,et al.  EyeQ: Practical Network Performance Isolation for the Multi-tenant Cloud , 2012, HotCloud.

[70]  David Walker,et al.  Abstractions for network update , 2012, SIGCOMM '12.

[71]  Srinivasan Keshav,et al.  It's not easy being green , 2012, CCRV.

[72]  Alejandro López-Ortiz,et al.  REWIRE: An optimization-based framework for unstructured data center network design , 2012, 2012 Proceedings IEEE INFOCOM.

[73]  Randy H. Katz,et al.  DeTail: reducing the flow completion time tail in datacenter networks , 2012, SIGCOMM '12.

[74]  T. N. Vijaykumar,et al.  Deadline-aware datacenter tcp (D2TCP) , 2012, CCRV.

[75]  尤达亚玛卡尔·斯瑞尼瓦桑,et al.  Dynamic load balancing without packet reordering , 2012 .

[76]  Xin Wu,et al.  zUpdate: updating data center networks with zero loss , 2013, SIGCOMM.

[77]  Ali Munir,et al.  Minimizing flow completion times in data centers , 2013, 2013 Proceedings IEEE INFOCOM.

[78]  Deng Pan,et al.  Joint Host-Network Optimization for Energy-Efficient Data Center Networking , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[79]  Srikanth Kandula,et al.  Achieving high utilization with software-driven WAN , 2013, SIGCOMM.

[80]  Nick McKeown,et al.  pFabric: minimal near-optimal datacenter transport , 2013, SIGCOMM.

[81]  Amin Vahdat,et al.  Aspen trees: balancing data center fault tolerance, scalability and cost , 2013, CoNEXT.

[82]  Michael J. Freedman,et al.  Scalable, optimal flow routing in datacenters via local link balancing , 2013, CoNEXT.

[83]  Yuanyuan Yang,et al.  Oversubscription Bounded Multicast Scheduling in Fat-Tree Data Center Networks , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[84]  Albert G. Greenberg,et al.  EyeQ: Practical Network Performance Isolation at the Edge , 2013, NSDI.

[85]  Dong Lin,et al.  Data Center Networks: Topologies, Architectures and Fault-Tolerance Characteristics , 2013 .

[86]  Lisandro Zambenedetti Granville,et al.  Data Center Network Virtualization: A Survey , 2013, IEEE Communications Surveys & Tutorials.

[87]  Amin Vahdat,et al.  Integrating microsecond circuit switching into the data center , 2013, SIGCOMM.

[88]  Haitao Wu,et al.  Per-packet load-balanced, low-latency routing for clos-based data center networks , 2013, CoNEXT.

[89]  Ramana Rao Kompella,et al.  On the impact of packet spraying in data center networks , 2013, 2013 Proceedings IEEE INFOCOM.

[90]  Stefano Secci,et al.  Achieving sub-second downtimes in internet-wide virtual machine live migrations in LISP networks , 2013, 2013 IFIP/IEEE International Symposium on Integrated Network Management (IM 2013).

[91]  Haitao Wu,et al.  ICTCP: Incast Congestion Control for TCP in Data-Center Networks , 2013, IEEE/ACM Transactions on Networking.

[92]  Li Tang,et al.  Taming TCP incast throughput collapse in data center networks , 2013, 2013 21st IEEE International Conference on Network Protocols (ICNP).

[93]  Min Zhu,et al.  B4: experience with a globally-deployed software defined wan , 2013, SIGCOMM.

[94]  Vyas Sekar,et al.  Patch panels in the sky: a case for free-space optics in data centers , 2013, HotNets.

[95]  Chen Chen,et al.  Datacast: A Scalable and Efficient Reliable Group Data Delivery Service for Data Centers , 2012, IEEE Journal on Selected Areas in Communications.

[96]  Sujata Banerjee,et al.  ElasticSwitch: practical work-conserving bandwidth guarantees for cloud computing , 2013, SIGCOMM.

[97]  Antony I. T. Rowstron,et al.  Decentralized task-aware scheduling for data center networks , 2014, SIGCOMM.

[98]  Yu Cao,et al.  Explicit multipath congestion control for data center networks , 2013, CoNEXT.

[99]  M. Alizadeh,et al.  CONGA: distributed congestion-aware load balancing for datacenters , 2015, SIGCOMM.

[100]  Lotfi Mhamdi,et al.  A survey on architectures and energy efficiency in Data Center Networks , 2014, Comput. Commun..

[101]  Stefano Secci,et al.  Network design requirements for disaster resilience in IaaS clouds , 2014, IEEE Communications Magazine.

[102]  Haitao Wu,et al.  PAC: Taming TCP Incast Congestion Using Proactive ACK Control , 2014, 2014 IEEE 22nd International Conference on Network Protocols.

[103]  F. Huijskens,et al.  Ultra-high-density spatial division multiplexing with a few-mode multicore fibre , 2014, Nature Photonics.

[104]  Rodrigo Fonseca,et al.  Planck , 2014, SIGCOMM.

[105]  Yonggang Wen,et al.  Toward Scalable Systems for Big Data Analytics: A Technology Tutorial , 2014, IEEE Access.

[106]  Bin Liu,et al.  $$\upmu \mathrm{DC}^2$$μDC2: unified data collection for data centers , 2014, The Journal of Supercomputing.

[107]  Fahad R. Dogar,et al.  Friends, not foes , 2014, SIGCOMM.

[108]  Himanshu Shah,et al.  FireFly , 2014, SIGCOMM.

[109]  Abdul Hameed,et al.  Future Generation Computer Systems ( ) – Future Generation Computer Systems a Taxonomy and Survey on Green Data Center Networks Keywords: Data Center Data Center Networks Network Architectures Network Performance Network Management Network Experimentation , 2022 .

[110]  Ankit Singla,et al.  OSA: An Optical Switching Architecture for Data Center Networks With Unprecedented Flexibility , 2012, IEEE/ACM Transactions on Networking.

[111]  Zhiyang Su,et al.  Rethinking the Data Center Networking: Architecture, Network Protocols, and Resource Sharing , 2014, IEEE Access.

[112]  Yonggang Wen,et al.  “ A Survey of Software Defined Networking , 2020 .

[113]  Stefano Secci,et al.  Server placement with shared backups for disaster-resilient clouds , 2015, Comput. Networks.

[114]  Deep Medhi,et al.  Striking a Balance Between Traffic Engineering and Energy Efficiency in Virtual Machine Placement , 2015, IEEE Transactions on Network and Service Management.

[115]  Robert N. M. Watson,et al.  Queues Don't Matter When You Can JUMP Them! , 2015, NSDI.

[116]  Hua Chen,et al.  Pingmesh: A Large-Scale System for Data Center Network Latency Measurement and Analysis , 2015, SIGCOMM.

[117]  Kaikai Xu,et al.  Theoretical and numerical characterization of a 40 Gbps long-haul multi-channel transmission system with dispersion compensation , 2015 .

[118]  Enrique Cauich Zermeno,et al.  BwE: Flexible, Hierarchical Bandwidth Allocation for WAN Distributed Computing , 2015, Computer communication review.

[119]  Alex C. Snoeren,et al.  Inside the Social Network's (Datacenter) Network , 2015, Comput. Commun. Rev..

[120]  Ben Y. Zhao,et al.  Packet-Level Telemetry in Large Datacenter Networks , 2015, SIGCOMM.

[121]  Amin Vahdat,et al.  Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google's Datacenter Network , 2015, Comput. Commun. Rev..

[122]  Sebti Foufou,et al.  A survey of wireless data center networks , 2015, 2015 49th Annual Conference on Information Sciences and Systems (CISS).

[123]  Stefano Secci,et al.  Reliability and Survivability Analysis of Data Center Network Topologies , 2015, Journal of Network and Systems Management.

[124]  Costin Raiciu,et al.  Increasing Datacenter Network Utilisation with GRIN , 2015, NSDI.

[125]  Jianjun Yu,et al.  Recent progress on high-speed optical transmission , 2016, Digit. Commun. Networks.

[126]  Haitao Wu,et al.  Explicit Path Control in Commodity Data Centers: Design and Applications , 2016, IEEE/ACM Transactions on Networking.

[127]  Ling Qiu,et al.  Towards cost-efficient workload scheduling for a Tango between geo-distributed data center and power grid , 2016, 2016 IEEE International Conference on Communications (ICC).

[128]  Emmanouel A. Varvarigos,et al.  Survey , 2016, ACM Comput. Surv..

[129]  Rakesh Tripathi,et al.  Minimizing cost of provisioning in fault-tolerant distributed data centers with durability constraints , 2016, 2016 IEEE International Conference on Communications (ICC).