Schemes for Fast Transmission of Flows in Data Center Networks

In this paper, we survey different existing schemes for the transmission of flows in Data Center Networks (DCNs). The transport of flows in DCNs must cope with the bandwidth demands of the traffic that a large number of data center applications generates and achieve high utilization of the data center infrastructure to make the data center financially viable. Traffic in DCNs roughly comprises short flows, which are generated by the Partition/Aggregate model adopted by several applications and have sizes of a few kilobytes, and long flows, which are data for the operation and maintenance of the data center and have sizes on the order of megabytes. Short flows must be transmitted (or completed) as soon as possible or within a deadline, and long flows must be serviced with a minimum acceptable throughput. The coexistence of short and long flows may jeopardize achieving both performance objectives simultaneously. This challenge has motivated growing research on schemes for managing the transmission of flows in DCNs. We describe several recent schemes aimed at reducing the flow completion time in DCNs. We also present a summary of existing solutions for the incast traffic phenomenon. We provide a comparison and classification of the surveyed schemes, describe their advantages and disadvantages, and show the different trends for scheme design. For completeness, we describe some DCN architectures, discuss the traffic patterns of DCNs, and discuss why some existing versions of transport protocols may not be usable in DCNs. At the end, we discuss some of the identified research challenges.

[1]  Yellu Sreenivasulu,et al.  FAST TRANSPARENT MIGRATION FOR VIRTUAL MACHINES , 2014 .

[2]  Sergei Vassilvitskii,et al.  A model of computation for MapReduce , 2010, SODA '10.

[3]  Henrik Vikstén,et al.  Performance and Scalability of Sudoku Solvers , 2013 .

[4]  Albert Y. Zomaya,et al.  Quantitative comparisons of the state‐of‐the‐art data center architectures , 2013, Concurr. Comput. Pract. Exp..

[5]  Amin Vahdat,et al.  Hedera: Dynamic Flow Scheduling for Data Center Networks , 2010, NSDI.

[6]  Amar Phanishayee,et al.  Safe and effective fine-grained TCP retransmissions for datacenter communication , 2009, SIGCOMM '09.

[7]  Linus Schrage,et al.  The Queue M/G/1 with the Shortest Remaining Processing Time Discipline , 1966, Oper. Res..

[8]  Mark Crovella,et al.  Proceedings of the Seventh COnference on emerging Networking EXperiments and Technologies , 2011 .

[9]  Amin Vahdat,et al.  Data Center Switch Architecture in the Age of Merchant Silicon , 2009, 2009 17th IEEE Symposium on High Performance Interconnects.

[10]  Nick McKeown,et al.  OpenFlow: enabling innovation in campus networks , 2008, CCRV.

[11]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[12]  Yu Cao,et al.  Explicit multipath congestion control for data center networks , 2013, CoNEXT.

[13]  Ali Munir,et al.  Minimizing flow completion times in data centers , 2013, 2013 Proceedings IEEE INFOCOM.

[14]  Ankit Singla,et al.  Jellyfish: Networking Data Centers Randomly , 2011, NSDI.

[15]  Vinay Trivedi,et al.  Performance and Scalability , 2019, How to Speak Tech.

[16]  M. Waldrop Data center in a box. , 2007, Scientific American.

[17]  Cyriel Minkenberg,et al.  All routes to efficient datacenter fabrics , 2014, INA-OCMC '14.

[18]  Cheng Jin,et al.  FAST TCP: Motivation, Architecture, Algorithms, Performance , 2006, IEEE/ACM Transactions on Networking.

[19]  Amin Vahdat,et al.  A scalable, commodity data center network architecture , 2008, SIGCOMM '08.

[20]  Brian E. Carpenter,et al.  IPv6 Flow Label Specification , 2004, RFC.

[21]  Yan Zhang,et al.  On Architecture Design, Congestion Notification, TCP Incast and Power Consumption in Data Centers , 2013, IEEE Communications Surveys & Tutorials.

[22]  Haitao Wu,et al.  ICTCP: Incast Congestion Control for TCP in Data-Center Networks , 2013, IEEE/ACM Transactions on Networking.

[23]  Injong Rhee,et al.  Delay-based congestion avoidance for TCP , 2003, TNET.

[24]  Austin Donnelly,et al.  CamCube: A key-based data center , 2010 .

[25]  Jun Zhang,et al.  TCP-FIT: An improved TCP congestion control algorithm and its performance , 2011, 2011 Proceedings IEEE INFOCOM.

[26]  Ramana Rao Kompella,et al.  On the impact of packet spraying in data center networks , 2013, 2013 Proceedings IEEE INFOCOM.

[27]  David Thaler,et al.  Multipath Issues in Unicast and Multicast Next-Hop Selection , 2000, RFC.

[28]  Mark Handley,et al.  Improving datacenter performance and robustness with multipath TCP , 2011, SIGCOMM 2011.

[29]  Amin Vahdat,et al.  Less Is More: Trading a Little Bandwidth for Ultra-Low Latency in the Data Center , 2012, NSDI.

[30]  A. L. Narasimha Reddy,et al.  LTCP: improving the performance of TCP in highspeed networks , 2006, CCRV.

[31]  Mung Chiang,et al.  Multiresource Allocation: Fairness–Efficiency Tradeoffs in a Unifying Framework , 2012, IEEE/ACM Transactions on Networking.

[32]  Jörg Ott,et al.  Proceedings of the ACM SIGCOMM 2012 conference on Applications, technologies, architectures, and protocols for computer communication , 2012, SIGCOMM 2012.

[33]  Broadcom Smart-Buffer Technology in Data Center Switches for Cost-Effective Performance Scaling of Cloud Applications , 2012 .

[34]  Roberto Rojas-Cessa,et al.  Communication-Aware and Energy-Efficient Scheduling for Parallel Applications in Virtualized Data Centers , 2013, 2013 IEEE/ACM 6th International Conference on Utility and Cloud Computing.

[35]  Emin Gün Sirer,et al.  Small-world datacenters , 2011, SoCC.

[36]  Samba Siva Reddy Maripalli Congestion Control for TCP in DataCenter Networks , 2014 .

[37]  Albert G. Greenberg,et al.  The cost of a cloud: research problems in data center networks , 2008, CCRV.

[38]  Guillaume Urvoy-Keller,et al.  Analysis of LAS scheduling for job size distributions with high variance , 2003, SIGMETRICS '03.

[39]  Dong Lin,et al.  Hyper-BCube: A scalable data center network , 2012, 2012 IEEE International Conference on Communications (ICC).

[40]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[41]  G. Jiang,et al.  Coordinating Virtual Machine Migrations in Enterprise Data Centers and Clouds , 2012 .

[42]  Vijay Sivaraman,et al.  Packet Pacing in Short Buffer Optical Packet Switched Networks , 2006, Proceedings IEEE INFOCOM 2006. 25TH IEEE International Conference on Computer Communications.

[43]  Konstantina Papagiannaki,et al.  c-Through: part-time optics in data centers , 2010, SIGCOMM 2010.

[44]  Nick McKeown,et al.  Processor Sharing Flows in the Internet , 2005, IWQoS.

[45]  Junda Liu,et al.  Multi-enterprise networking , 2000 .

[46]  Vern Paxson,et al.  TCP Congestion Control , 1999, RFC.

[47]  T. N. Vijaykumar,et al.  Deadline-aware datacenter tcp (D2TCP) , 2012, CCRV.

[48]  E. L. Hahne,et al.  Round-Robin Scheduling for Max-Min Fairness in Data Networks , 1991, IEEE J. Sel. Areas Commun..

[49]  尤达亚玛卡尔·斯瑞尼瓦桑,et al.  Dynamic load balancing without packet reordering , 2012 .

[50]  Mor Harchol-Balter,et al.  Analysis of SRPT scheduling: investigating unfairness , 2001, SIGMETRICS '01.

[51]  Zhi-Li Zhang,et al.  A first look at inter-data center traffic characteristics via Yahoo! datasets , 2011, 2011 Proceedings IEEE INFOCOM.

[52]  Nick McKeown,et al.  Why flow-completion time is the right metric for congestion control , 2006, CCRV.

[53]  Kashi Venkatesh Vishwanath,et al.  Modular data centers: how to design them? , 2009, LSAP '09.

[54]  Nathan Farrington,et al.  Facebook's data center network architecture , 2013, 2013 Optical Interconnects Conference.

[55]  Sally Floyd,et al.  The NewReno Modification to TCP's Fast Recovery Algorithm , 2004, RFC.

[56]  David L. Black,et al.  The Addition of Explicit Congestion Notification (ECN) to IP , 2001, RFC.

[57]  Rong Pan,et al.  Data center transport mechanisms: Congestion control theory and IEEE standardization , 2008, 2008 46th Annual Allerton Conference on Communication, Control, and Computing.

[58]  Amin Vahdat,et al.  Helios: a hybrid electrical/optical switch architecture for modular data centers , 2010, SIGCOMM '10.

[59]  Roberto Rojas-Cessa,et al.  DAQ: Deadline-Aware Queue scheme for scheduling service flows in data centers , 2014, 2014 IEEE International Conference on Communications (ICC).

[60]  Abdul Hameed,et al.  Future Generation Computer Systems ( ) – Future Generation Computer Systems a Taxonomy and Survey on Green Data Center Networks Keywords: Data Center Data Center Networks Network Architectures Network Performance Network Management Network Experimentation , 2022 .

[61]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[62]  Haitao Wu,et al.  MDCube: a high performance network structure for modular data center interconnection , 2009, CoNEXT '09.

[63]  Christo Wilson,et al.  Better never than late , 2011, SIGCOMM 2011.

[64]  Ming Zhang,et al.  Understanding data center traffic characteristics , 2010, CCRV.

[65]  Ming Zhang,et al.  MicroTE: fine grained traffic engineering for data centers , 2011, CoNEXT '11.

[66]  Brighten Godfrey,et al.  Finishing flows quickly with preemptive scheduling , 2012, CCRV.

[67]  M. P. Tahiliani,et al.  TCP Variants for Data Center Networks: A Comparative Study , 2012, 2012 International Symposium on Cloud and Services Computing.

[68]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[69]  Zhiyang Su,et al.  Rethinking the Data Center Networking: Architecture, Network Protocols, and Resource Sharing , 2014, IEEE Access.

[70]  Rodney S. Tucker,et al.  Green Cloud Computing: Balancing Energy in Processing, Storage, and Transport , 2011, Proceedings of the IEEE.

[71]  Haitao Wu,et al.  FiConn: Using Backup Port for Server Interconnection in Data Centers , 2009, IEEE INFOCOM 2009.

[72]  Nick McKeown,et al.  Deconstructing datacenter packet transport , 2012, HotNets-XI.

[73]  Lotfi Mhamdi,et al.  A survey on architectures and energy efficiency in Data Center Networks , 2014, Comput. Commun..

[74]  Yan Zhang,et al.  On mitigating TCP Incast in Data Center Networks , 2011, 2011 Proceedings IEEE INFOCOM.

[75]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[76]  Mounir Hamdi,et al.  SprintNet: A high performance server-centric network architecture for data centers , 2014, 2014 IEEE International Conference on Communications (ICC).

[77]  Mark Handley,et al.  TCP Extensions for Multipath Operation with Multiple Addresses , 2020, RFC.

[78]  Larry L. Peterson,et al.  TCP Vegas: new techniques for congestion detection and avoidance , 1994 .

[79]  V. Jacobson,et al.  Congestion avoidance and control , 1988, CCRV.

[80]  Srinivasan Seshan,et al.  Measurement and Analysis of TCP Throughput Collapse in Cluster-based Storage Systems , 2008, FAST.

[81]  Raj Jain,et al.  An Explicit Rate Control Framework for Lossless Ethernet Operation , 2008, 2008 IEEE International Conference on Communications.

[82]  Gregory F. Pfister,et al.  “Hot spot” contention and combining in multistage interconnection networks , 1985, IEEE Transactions on Computers.

[83]  Adel Javanmard,et al.  Analysis of DCTCP: stability, convergence, and fairness , 2011, PERV.

[84]  Dong Lin,et al.  FlatNet: Towards a flatter data center network , 2012, 2012 IEEE Global Communications Conference (GLOBECOM).

[85]  Baochun Li,et al.  RepFlow: Minimizing flow completion times with replicated flows in data centers , 2013, IEEE INFOCOM 2014 - IEEE Conference on Computer Communications.

[86]  Amin Vahdat,et al.  PortLand: a scalable fault-tolerant layer 2 data center network fabric , 2009, SIGCOMM '09.

[87]  Linus Schrage,et al.  Letter to the Editor - A Proof of the Optimality of the Shortest Remaining Processing Time Discipline , 1968, Oper. Res..

[88]  Ankit Singla,et al.  OSA: An Optical Switching Architecture for Data Center Networks With Unprecedented Flexibility , 2012, IEEE/ACM Transactions on Networking.

[89]  Lotfi Kamoun,et al.  Ethernet Congestion Manager characteristics, calibration and analysis , 2010, The Second International Conference on Communications and Networking.

[90]  Monia Ghobadi,et al.  TCP Pacing in Data Center Networks , 2013, 2013 IEEE 21st Annual Symposium on High-Performance Interconnects.

[91]  David A. Maltz,et al.  Data center TCP (DCTCP) , 2010, SIGCOMM 2010.

[92]  Mark Handley,et al.  Architectural Guidelines for Multipath TCP Development , 2011, RFC.

[93]  Mark Handley,et al.  RFC 6182, Architectural Guidelines for Multipath TCP Development , 2011 .

[94]  Raj Jain,et al.  Analysis of Backward Congestion Notification (BCN) for Ethernet In Datacenter Applications , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[95]  Injong Rhee,et al.  CUBIC: a new TCP-friendly high-speed TCP variant , 2008, OPSR.

[96]  Randy H. Katz,et al.  DeTail: reducing the flow completion time tail in datacenter networks , 2012, SIGCOMM '12.

[97]  Athanasios V. Vasilakos,et al.  Survey on routing in data centers: insights and future directions , 2011, IEEE Network.

[98]  James R. Hamilton,et al.  An Architecture for Modular Data Centers , 2006, CIDR.

[99]  Yongguang Zhang,et al.  A Control Theoretic Analysis of XCP , 2006, Proceedings IEEE INFOCOM 2006. 25TH IEEE International Conference on Computer Communications.

[100]  Larry L. Peterson,et al.  TCP Vegas: End to End Congestion Avoidance on a Global Internet , 1995, IEEE J. Sel. Areas Commun..

[101]  Van Jacobson,et al.  TCP extensions for long-delay paths , 1988, RFC.

[102]  I. Stoica,et al.  FairCloud: sharing the network in cloud computing , 2011, CCRV.

[103]  Chuang Lin,et al.  Modeling and understanding TCP incast in data center networks , 2011, 2011 Proceedings IEEE INFOCOM.

[104]  Nick McKeown,et al.  pFabric: minimal near-optimal datacenter transport , 2013, SIGCOMM.

[105]  Tharam S. Dillon,et al.  Cloud Computing: Issues and Challenges , 2010, 2010 24th IEEE International Conference on Advanced Information Networking and Applications.

[106]  Lei Shi,et al.  Dcell: a scalable and fault-tolerant network structure for data centers , 2008, SIGCOMM '08.

[107]  Rong Pan,et al.  AF-QCN: Approximate Fairness with Quantized Congestion Notification for Multi-tenanted Data Centers , 2010, 2010 18th IEEE Symposium on High Performance Interconnects.

[108]  David A. Maltz,et al.  Network traffic characteristics of data centers in the wild , 2010, IMC '10.

[109]  Randy H. Katz,et al.  A view of cloud computing , 2010, CACM.

[110]  Stewart Bryant,et al.  Internet Engineering Task Force (IETF) , 2015 .

[111]  André Wenzel,et al.  On the effects of the IEEE 802.3x flow control in full-duplex Ethernet LANs , 1999, Proceedings 24th Conference on Local Computer Networks. LCN'99.

[112]  Chakchai So-In,et al.  Enhanced Forward Explicit Congestion Notification (E-FECN) scheme for datacenter Ethernet networks , 2008, 2008 International Symposium on Performance Evaluation of Computer and Telecommunication Systems.

[113]  Edward G. Coffman,et al.  Waiting Time Distributions for Processor-Sharing Systems , 1970, JACM.

[114]  Albert G. Greenberg,et al.  The nature of data center traffic: measurements & analysis , 2009, IMC '09.

[115]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[116]  Robert Birke,et al.  Got loss? Get zOVN! , 2013, SIGCOMM.

[117]  Hermann Kopetz,et al.  Real-time systems , 2018, CSC '73.

[118]  Sally Floyd,et al.  HighSpeed TCP for Large Congestion Windows , 2003, RFC.

[119]  Kenneth Ward Church,et al.  On Delivering Embarrassingly Distributed Cloud Services , 2008, HotNets.

[120]  Thomas F. Wenisch,et al.  Power management of online data-intensive services , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[121]  Antony Rowstron,et al.  Symbiotic routing in future data centers , 2010, SIGCOMM 2010.

[122]  GhemawatSanjay,et al.  The Google file system , 2003 .

[123]  Baochun Li,et al.  TinyFlow: Breaking elephants down into mice in data center networks , 2014, 2014 IEEE 20th International Workshop on Local & Metropolitan Area Networks (LANMAN).

[124]  Randy H. Katz,et al.  Above the Clouds: A Berkeley View of Cloud Computing , 2009 .

[125]  Haitao Wu,et al.  BCube: a high performance, server-centric network architecture for modular data centers , 2009, SIGCOMM '09.

[126]  George Varghese,et al.  CONGA: distributed congestion-aware load balancing for datacenters , 2015, SIGCOMM.

[127]  T. S. Eugene Ng,et al.  The Impact of Virtualization on Network Performance of Amazon EC2 Data Center , 2010, 2010 Proceedings IEEE INFOCOM.

[128]  Injong Rhee,et al.  Binary increase congestion control (BIC) for fast long-distance networks , 2004, IEEE INFOCOM 2004.

[129]  Christian E. Hopps,et al.  Analysis of an Equal-Cost Multi-Path Algorithm , 2000, RFC.

[130]  Ali Munir,et al.  On achieving low latency in data centers , 2013, 2013 IEEE International Conference on Communications (ICC).