Implementation of PFC and RCM for RoCEv2 Simulation in OMNeT++

As traffic patterns and network topologies become more and more complicated in current enterprise data centers and TOP500 supercomputers, the probability of network congestion increases. If no countermeasures are taken, network congestion causes long communication delays and degrades network performance. A congestion control mechanism is often provided to reduce the consequences of congestion. However, it is usually difficult to configure and activate a congestion control mechanism in production clusters and supercomputers due to concerns that it may negatively impact jobs if the mechanism is not appropriately configured. Therefore, simulations for these situations are necessary to identify congestion points and sources, and more importantly, to determine optimal settings that can be utilized to reduce congestion in those complicated networks. In this paper, we use OMNeT++ to implement the IEEE 802.1Qbb Priority-based Flow Control (PFC) and RoCEv2 Congestion Management (RCM) in order to simulate clusters with RoCEv2 interconnects.

[1]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .

[2]  Informatika Priority-Based Flow Control , 2010 .

[3]  Jesús Labarta,et al.  Impact of Inter-application Contention in Current and Future HPC Systems , 2010, 2010 18th IEEE Symposium on High Performance Interconnects.

[4]  Brian Tierney,et al.  Efficient data transfer protocols for big data , 2012, 2012 IEEE 8th International Conference on E-Science.

[5]  Katherine E. Isaacs,et al.  There goes the neighborhood: Performance degradation due to nearby jobs , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[6]  Sven-Arne Reinemo,et al.  InfiniBand congestion control: modelling and validation , 2011, SimuTools.

[7]  David L. Black,et al.  The Addition of Explicit Congestion Notification (ECN) to IP , 2001, RFC.