Survey on Traffic Management in Data Center Network: From Link Layer to Application Layer

Due to the explosive growth of all kinds of Internet services, data centers have become an irreplaceable and vital infrastructure to support this soaring trend. Compared with traditional networks, data center networks (DCNs) have unique features, such as high bandwidth, low latency, many-to-one communication mode, shallow buffered switches, and multi-root topology. These new characteristics pose a lot of challenges to previous network technics (e.g., Ethernet, Equal Cost Multi-path (ECMP), TCP), making them hard to adapt to DCNs and leading to severe performance degradation. In order to solve these challenges, DCNs have attracted a lot of attention from the industry and academia in recent years, and many new mechanisms in different layers are proposed to improve the transmission performance of data center networks. In the meantime, many surveys have emerged currently to introduce the current research of data center networks. However, previous surveys of DCNs mainly focus on only one specific network layer, making them difficult for readers to know about advanced researches on a holistic level. To help readers comprehend the current research progress of data center networks quickly, we employ a multi-layered top down taxonomy to classify the literature and propose several probable dimensions for future research in this area.

[1]  Haitao Wu,et al.  ICTCP: Incast Congestion Control for TCP in Data-Center Networks , 2010, IEEE/ACM Transactions on Networking.

[2]  Morteza Kheirkhah,et al.  Multipath transport and packet spraying for efficient data delivery in data centres , 2019, Comput. Networks.

[3]  F. Richard Yu,et al.  Load Balancing in Data Center Networks: A Survey , 2018, IEEE Communications Surveys & Tutorials.

[4]  T. N. Vijaykumar,et al.  Deadline-aware datacenter tcp (D2TCP) , 2012, SIGCOMM '12.

[5]  Albert G. Greenberg,et al.  Data center TCP (DCTCP) , 2010, SIGCOMM '10.

[6]  Hong Zhang,et al.  Resilient Datacenter Load Balancing in the Wild , 2017, SIGCOMM.

[7]  D. Zats,et al.  DeTail: reducing the flow completion time tail in datacenter networks , 2012, CCRV.

[8]  Jianxin Wang,et al.  Tuning high flow concurrency for MPTCP in data center networks , 2020, Journal of Cloud Computing.

[9]  Brahim Bensaou,et al.  Curbing Timeouts for TCP-Incast in Data Centers via A Cross-Layer Faster Recovery Mechanism , 2018, IEEE INFOCOM 2018 - IEEE Conference on Computer Communications.

[10]  Sheng Wang,et al.  Rapier: Integrating routing and scheduling for coflow-aware data center networks , 2015, 2015 IEEE Conference on Computer Communications (INFOCOM).

[11]  Tao Zhang,et al.  Tuning the Aggressive TCP Behavior for Highly Concurrent HTTP Connections in Intra-Datacenter , 2017, IEEE/ACM Transactions on Networking.

[12]  Jiao Zhang,et al.  FDALB: Flow distribution aware load balancing for datacenter networks , 2016, 2016 IEEE/ACM 24th International Symposium on Quality of Service (IWQoS).

[13]  Wenjun Lv,et al.  APS: Adaptive Packet Spraying to Isolate Mix-Flows in Data Center Network , 2022, IEEE Transactions on Cloud Computing.

[14]  Michael I. Jordan,et al.  Managing data transfers in computer clusters with orchestra , 2011, SIGCOMM.

[15]  Amar Phanishayee,et al.  Safe and effective fine-grained TCP retransmissions for datacenter communication , 2009, SIGCOMM '09.

[16]  Antony I. T. Rowstron,et al.  Decentralized task-aware scheduling for data center networks , 2014, SIGCOMM.

[17]  Mark Handley,et al.  Improving datacenter performance and robustness with multipath TCP , 2011, SIGCOMM.

[18]  Amin Vahdat,et al.  TIMELY: RTT-based Congestion Control for the Datacenter , 2015, Comput. Commun. Rev..

[19]  Jianxin Wang,et al.  Improving TCP Robustness over Asymmetry with Reordering Marking and Coding in Data Centers , 2019, 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS).

[20]  Lei Shi,et al.  Dcell: a scalable and fault-tolerant network structure for data centers , 2008, SIGCOMM '08.

[21]  Minlan Yu,et al.  HPCC: high precision congestion control , 2019, SIGCOMM.

[22]  Ramana Rao Kompella,et al.  The TCP Outcast Problem: Exposing Unfairness in Data Center Networks , 2012, NSDI.

[23]  Wenjun Lv,et al.  TLB: Traffic-aware Load Balancing with Adaptive Granularity in Data Center Networks , 2019, ICPP.

[24]  Baochun Li,et al.  RepFlow: Minimizing flow completion times with replicated flows in data centers , 2013, IEEE INFOCOM 2014 - IEEE Conference on Computer Communications.

[25]  Feng Liu,et al.  AuTO: scaling deep reinforcement learning for datacenter-scale automatic traffic optimization , 2018, SIGCOMM.

[26]  Eitan Altman,et al.  Blind, Adaptive and Robust Flow Segmentation in Datacenters , 2018, IEEE INFOCOM 2018 - IEEE Conference on Computer Communications.

[27]  Fengyuan Ren,et al.  TFC: token flow control in data center networks , 2016, EuroSys.

[28]  Jianxin Wang,et al.  AG: Adaptive Switching Granularity for Load Balancing with Asymmetric Topology in Data Center Network , 2019, 2019 IEEE 27th International Conference on Network Protocols (ICNP).

[29]  Amin Vahdat,et al.  A scalable, commodity data center network architecture , 2008, SIGCOMM '08.

[30]  Mark Handley,et al.  Re-architecting datacenter networks and stacks for low latency and high performance , 2017, SIGCOMM.

[31]  H. Jonathan Chao,et al.  TCP PLATO: Packet Labelling to Alleviate Time-Out , 2014, IEEE Journal on Selected Areas in Communications.

[32]  Kai Chen,et al.  Stream: Decentralized opportunistic inter-coflow scheduling for datacenter networks , 2016, 2016 IEEE 24th International Conference on Network Protocols (ICNP).

[33]  Guihai Chen,et al.  P-PFC: Reducing Tail Latency with Predictive PFC in Lossless Data Center Networks , 2020, IEEE Transactions on Parallel and Distributed Systems.

[34]  Ihsan Ayyub Qazi,et al.  Efficient load balancing over asymmetric datacenter topologies , 2018, Comput. Commun..

[35]  Mingwei Xu,et al.  LTTP: An LT-Code Based Transport Protocol for Many-to-One Communication in Data Centers , 2014, IEEE Journal on Selected Areas in Communications.

[36]  Ying Zhang,et al.  FBOSS: building switch software at scale , 2018, SIGCOMM.

[37]  Yanhui Geng,et al.  CODA: Toward Automatically Identifying and Scheduling Coflows in the Dark , 2016, SIGCOMM.

[38]  Albert G. Greenberg,et al.  VL2: a scalable and flexible data center network , 2009, SIGCOMM '09.

[39]  Yan Zhang,et al.  Fair Quantized Congestion Notification in Data Center Networks , 2013, IEEE Transactions on Communications.

[40]  Praveen Yalagandula,et al.  Mahout: Low-overhead datacenter traffic management using end-host-based elephant detection , 2011, 2011 Proceedings IEEE INFOCOM.

[41]  Ion Stoica,et al.  Efficient Coflow Scheduling Without Prior Knowledge , 2015, SIGCOMM.

[42]  Amin Vahdat,et al.  Hedera: Dynamic Flow Scheduling for Data Center Networks , 2010, NSDI.

[43]  Abdul Kabbani,et al.  FlowBender: Flow-level Adaptive Routing for Improved Latency and Throughput in Datacenter Networks , 2014, CoNEXT.

[44]  Roberto Rojas-Cessa,et al.  Schemes for Fast Transmission of Flows in Data Center Networks , 2015, IEEE Communications Surveys & Tutorials.

[45]  Haitao Wu,et al.  PAC: Taming TCP Incast Congestion Using Proactive ACK Control , 2014, 2014 IEEE 22nd International Conference on Network Protocols.

[46]  Jianxin Wang,et al.  Flow-Aware Adaptive Pacing to Mitigate TCP Incast in Data Center Networks , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS).

[47]  Tao Zhang,et al.  Adaptive-Acceleration Data Center TCP , 2015, IEEE Transactions on Computers.

[48]  Nick McKeown,et al.  pFabric: minimal near-optimal datacenter transport , 2013, SIGCOMM.

[49]  Chuang Lin,et al.  Sliding Mode Congestion Control for data center Ethernet networks , 2012, 2012 Proceedings IEEE INFOCOM.

[50]  Yiming Zhang,et al.  OPTAS: Decentralized flow monitoring and scheduling for tiny tasks , 2016, IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications.

[51]  Yonggang Wen,et al.  A Survey on Data Center Networking (DCN): Infrastructure and Operations , 2017, IEEE Communications Surveys & Tutorials.

[52]  Jianxin Wang,et al.  Analysis on Buffer Occupancy of Quantized Congestion Notification in Data Center Networks , 2016, IEICE Trans. Commun..

[53]  Ion Stoica,et al.  Efficient coflow scheduling with Varys , 2014, SIGCOMM.

[54]  Kai Chen,et al.  One More Config is Enough: Saving (DC)TCP for High-speed Extremely Shallow-buffered Datacenters , 2020, IEEE INFOCOM 2020 - IEEE Conference on Computer Communications.

[55]  Sen Liu,et al.  Task-aware TCP in Data Center Networks , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS).

[56]  Yi Pan,et al.  FSQCN: Fast and Simple Quantized Congestion Notification in Data Center Ethernet , 2016, 2016 IEEE 36th International Conference on Distributed Computing Systems (ICDCS).

[57]  Scott Shenker,et al.  Revisiting network support for RDMA , 2018, SIGCOMM.

[58]  Amin Vahdat,et al.  B4 and after: managing hierarchy, partitioning, and asymmetry for availability and scale in google's software-defined WAN , 2018, SIGCOMM.

[59]  Jitendra Padhye,et al.  Tagger: Practical PFC Deadlock Prevention in Data Center Networks , 2019, TNET.

[60]  Chuang Lin,et al.  Sharing Bandwidth by Allocating Switch Buffer in Data Center Networks , 2014, IEEE Journal on Selected Areas in Communications.

[61]  Bo Li,et al.  Coflex: Navigating the fairness-efficiency tradeoff for coflow scheduling , 2017, IEEE INFOCOM 2017 - IEEE Conference on Computer Communications.

[62]  Yi Sun,et al.  Adaptive Path Isolation for Elephant and Mice Flows by Exploiting Path Diversity in Datacenters , 2016, IEEE Transactions on Network and Service Management.

[63]  Tao Zhang,et al.  Rethinking Fast and Friendly Transport in Data Center Networks , 2020, IEEE/ACM Transactions on Networking.

[64]  Min Zhu,et al.  B4: experience with a globally-deployed software defined wan , 2013, SIGCOMM.

[65]  Jianxin Wang,et al.  DDT: Mitigating the Competitiveness Difference of Data Center TCPs , 2019, APNet.

[66]  Fan Yang,et al.  The QUIC Transport Protocol: Design and Internet-Scale Deployment , 2017, SIGCOMM.

[67]  Junxue Zhang,et al.  Enabling ECN for Datacenter Networks With RTT Variations , 2019, IEEE Transactions on Cloud Computing.

[68]  Baochun Li,et al.  TinyFlow: Breaking elephants down into mice in data center networks , 2014, 2014 IEEE 20th International Workshop on Local & Metropolitan Area Networks (LANMAN).

[69]  Jianxin Wang,et al.  ARS: Cross-layer adaptive request scheduling to mitigate TCP incast in data center networks , 2016, IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications.

[70]  Xiangqian Zhou,et al.  Understanding and improvement of the selection of replica servers in key-value stores , 2019, Inf. Syst..

[71]  Haitao Wu,et al.  Tuning ECN for data center networks , 2012, CoNEXT '12.

[72]  Jianxin Wang,et al.  Receiver-driven fair congestion control for TCP outcast in data center networks , 2019, J. Netw. Comput. Appl..

[73]  Ali Munir,et al.  Minimizing flow completion times in data centers , 2013, 2013 Proceedings IEEE INFOCOM.

[74]  Christian E. Hopps,et al.  Analysis of an Equal-Cost Multi-Path Algorithm , 2000, RFC.

[75]  Guihai Chen,et al.  DCQCN+: Taming Large-Scale Incast Congestion in RDMA over Ethernet Networks , 2018, 2018 IEEE 26th International Conference on Network Protocols (ICNP).

[76]  Enhong Chen,et al.  Multi-Path Transport for RDMA in Datacenters , 2018, NSDI.

[77]  Srikanth Kandula,et al.  Speeding up distributed request-response workflows , 2013, SIGCOMM.

[78]  Sen Liu,et al.  Traffic Control for RDMA-Enabled Data Center Networks: A Survey , 2020, IEEE Systems Journal.

[79]  Jennifer Rexford,et al.  HULA: Scalable Load Balancing Using Programmable Data Planes , 2016, SOSR.

[80]  Rong Pan,et al.  Let It Flow: Resilient Asymmetric Load Balancing with Flowlet Switching , 2017, NSDI.

[81]  Tony Tung,et al.  Scaling Memcache at Facebook , 2013, NSDI.

[82]  Wenjun Lv,et al.  CAPS: Coding-Based Adaptive Packet Spraying to Reduce Flow Completion Time in Data Center , 2019, IEEE/ACM Transactions on Networking.

[83]  Chuang Lin,et al.  Catch the Whole Lot in an Action: Rapid Precise Packet Loss Notification in Data Center , 2014, NSDI.

[84]  Yi Wang,et al.  Aeolus: A Building Block for Proactive Transport in Datacenters , 2020, SIGCOMM.

[85]  Nick McKeown,et al.  OpenFlow: enabling innovation in campus networks , 2008, CCRV.

[86]  Jianxin Wang,et al.  Adjusting Packet Size to Mitigate TCP Incast in Data Center Networks with COTS Switches , 2020, IEEE Transactions on Cloud Computing.

[87]  Antony I. T. Rowstron,et al.  Better never than late: meeting deadlines in datacenter networks , 2011, SIGCOMM.

[88]  Ramana Rao Kompella,et al.  On the impact of packet spraying in data center networks , 2013, 2013 Proceedings IEEE INFOCOM.

[89]  Haitao Wu,et al.  BCube: a high performance, server-centric network architecture for modular data centers , 2009, SIGCOMM '09.

[90]  Wanchun Jiang,et al.  TAP: Timeliness‐aware predication‐based replica selection algorithm for key‐value stores , 2019, Concurr. Comput. Pract. Exp..

[91]  Ion Stoica,et al.  Coflow: a networking abstraction for cluster applications , 2012, HotNets-XI.

[92]  Alex C. Snoeren,et al.  Inside the Social Network's (Datacenter) Network , 2015, Comput. Commun. Rev..

[93]  Weihe Li,et al.  Mitigating Packet Reordering for Random Packet Spraying in Data Center Networks , 2021, IEEE/ACM Transactions on Networking.

[94]  Chuang Lin,et al.  Survey on transport control in data center networks , 2013, IEEE Network.

[95]  Ming Zhang,et al.  MicroTE: fine grained traffic engineering for data centers , 2011, CoNEXT '11.

[96]  Chuang Lin,et al.  Phase Plane Analysis of Quantized Congestion Notification for Data Center Ethernet , 2015, IEEE/ACM Transactions on Networking.

[97]  Brighten Godfrey,et al.  DRILL: Micro Load Balancing for Low-latency Data Center Networks , 2017, SIGCOMM.

[98]  Jianxin Wang,et al.  Reducing Flow Completion Time with Replaceable Redundant Packets in Data Center Networks , 2019, 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS).

[99]  Min Zhu,et al.  WCMP: weighted cost multipathing for improved fairness in data centers , 2014, EuroSys '14.

[100]  Changhyun Lee,et al.  Accurate Latency-based Congestion Feedback for Datacenters , 2015, USENIX Annual Technical Conference.

[101]  Antonio Fernández,et al.  Bisection (Band)Width of Product Networks with Application to Data Centers , 2012, IEEE Transactions on Parallel and Distributed Systems.

[102]  Haitao Wu,et al.  Enabling ECN over Generic Packet Scheduling , 2016, CoNEXT.

[103]  Jennifer Rexford,et al.  CLOVE: How I learned to stop worrying about the core and love the edge , 2016, HotNets.

[104]  Devavrat Shah,et al.  Fastpass , 2014, SIGCOMM.

[105]  Anja Feldmann,et al.  C3: Cutting Tail Latency in Cloud Data Stores via Adaptive Replica Selection , 2015, NSDI.