Efficient Dynamic Isolation of Congestion in Lossless DataCenter Networks

The architecture of modern DataCenters (DCs) has evolved to meet the stringent communication latency requirements of applications. RDMA technologies such as RoCEv2 have become mainstream to reduce latency, but their performance is impaired in systems with lossy networks due to the overload introduced by packet retransmissions. Thus, lossless networks are increasingly used in DCs to avoid retransmissions delays. However, lossless networks favor the occurrence of congestion, degrading network and system performance. Traditional congestion solutions, such as backpressure or injection throttling, may be ineffective when congestion arises from traffic generated by DC applications. Hence, new efficient congestion management strategies suited to the lossless networks of modern DCs are required. In this paper, we analyze congestion and its negative effects in these scenarios. In addition, we propose and evaluate a congestion management strategy that effectively eliminates the main negative effects of congestion, based on the dynamic isolation of congested flows in special queues. Unlike previous proposals based on this approach, a single special queue is shared by all the congested flows reaching a port. We also propose enhancements to this basic strategy to optimize its efficiency.

[1]  José Duato,et al.  An Effective and Feasible Congestion Management Technique for High-Performance MINs with Tag-Based Distributed Routing , 2013, IEEE Transactions on Parallel and Distributed Systems.

[2]  José Duato,et al.  Buffer Management Strategies to Reduce HoL Blocking , 2010, IEEE Transactions on Parallel and Distributed Systems.

[3]  Olav Lysne,et al.  vFtree - A Fat-Tree Routing Algorithm Using Virtual Lanes to Alleviate Congestion , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[4]  Michael Lang,et al.  Optimized InfiniBandTM fat‐tree routing for shift all‐to‐all communication patterns , 2010, Concurr. Comput. Pract. Exp..

[5]  David L. Black,et al.  The Addition of Explicit Congestion Notification (ECN) to IP , 2001, RFC.

[6]  José Duato,et al.  RECN-IQ: A Cost-Effective Input-Queued Switch Architecture with Congestion Management , 2007, 2007 International Conference on Parallel Processing (ICPP 2007).

[7]  José Duato,et al.  Efficient, Scalable Congestion Management for Interconnection Networks , 2006, IEEE Micro.

[8]  Srinivasan Seshan,et al.  Measurement and Analysis of TCP Throughput Collapse in Cluster-based Storage Systems , 2008, FAST.

[9]  Aristides Efthymiou,et al.  Pipelined memory shared buffer for VLSI switches , 1995, SIGCOMM '95.

[10]  José Duato,et al.  OBQA: Smart and cost-efficient queue scheme for Head-of-Line blocking elimination in fat-trees , 2011, J. Parallel Distributed Comput..

[11]  John A. Copeland,et al.  Buffer management for shared-memory ATM switches , 2000, IEEE Communications Surveys & Tutorials.

[12]  Sakir Sezer,et al.  Design and implementation of a shared buffer architecture for a gigabit Ethernet packet switch , 2005, Proceedings 2005 IEEE International SOC Conference.

[13]  Nian-Feng Tzeng,et al.  Distributing Hot-Spot Addressing in Large-Scale Multiprocessors , 1987, IEEE Transactions on Computers.

[14]  Alex C. Snoeren,et al.  Inside the Social Network's (Datacenter) Network , 2015, Comput. Commun. Rev..

[15]  José Duato,et al.  FBICM: efficient congestion management for high-performance networks using distributed deterministic routing , 2008, HiPC'08.

[16]  Nick McKeown,et al.  Techniques for Fast Shared Memory Switches , 2001 .

[17]  Wladek Olesinski,et al.  Scalable Alternatives to Virtual Output Queuing , 2009, 2009 IEEE International Conference on Communications.

[18]  Yuval Tamir,et al.  Dynamically-Allocated Multi-Queue Buffers for VLSI Communication Switches , 1992, IEEE Trans. Computers.

[19]  Yongqiang Xiong,et al.  Congestion Control for High-speed Extremely Shallow-buffered Datacenter Networks , 2017, APNet.

[20]  Thomas E. Anderson,et al.  High-speed switch scheduling for local-area networks , 1993, TOCS.

[21]  Rong Pan,et al.  AF-QCN: Approximate Fairness with Quantized Congestion Notification for Multi-tenanted Data Centers , 2010, 2010 18th IEEE Symposium on High Performance Interconnects.

[22]  Nick McKeown,et al.  Output-buffer ATM packet switching for integrated-services communication networks , 1997, Proceedings of ICC'97 - International Conference on Communications.

[23]  Christian E. Hopps,et al.  Analysis of an Equal-Cost Multi-Path Algorithm , 2000, RFC.

[24]  José Duato,et al.  Dynamic Evolution of Congestion Trees: Analysis and Impact on Switch Architecture , 2005, HiPEAC.

[25]  Samuel P. Morgan,et al.  Input Versus Output Queueing on a Space-Division Packet Switch , 1987, IEEE Trans. Commun..