Contention-based congestion management in large-scale networks

Global adaptive routing exploits non-minimal paths to improve performance on adversarial traffic patterns and load-balance network channels in large-scale networks. However, most prior work on global adaptive routing have assumed admissible traffic pattern where no endpoint node is oversubscribed. In the presence of a greedy flow or hotspot traffic, we show how exploiting path diversity with global adaptive routing can spread network congestion and degrade performance. When global adaptive routing is combined with congestion management, the two types of congestion - network congestion that occurs within the interconnection network channels and endpoint congestion that occurs from oversubscribed endpoint nodes - are not properly differentiated. As a result, previously proposed congestion management mechanisms that are effective in addressing endpoint congestion are not necessarily effective when global adaptive routing is also used in the network. Thus, we propose a novel, low-cost contention-based congestion management (CBCM) to identify endpoint congestion based on the contention within the intermediate routers and at the endpoint nodes. While contention also occurs for network congestion, the endpoint nodes or the destination determines whether the congestion is endpoint congestion or network congestion. If it is only network congestion, CBCM ignores the network congestion and adaptive routing is allowed to minimize network congestion. However, if endpoint congestion occurs, CBCM throttles the hotspot senders and minimally route the traffic through a separate VC. Our evaluation across different traffic patterns and network sizes demonstrates that our approach is more robust in identifying endpoint congestion in the network while complementing global adaptive routing to avoid network congestion.

[1]  Nan Jiang,et al.  Indirect adaptive routing on large scale interconnection networks , 2009, ISCA '09.

[2]  Samuel P. Morgan,et al.  Input Versus Output Queueing on a Space-Division Packet Switch , 1987, IEEE Trans. Commun..

[3]  Chita R. Das,et al.  A low latency router supporting adaptivity for on-chip interconnects , 2005, Proceedings. 42nd Design Automation Conference, 2005..

[4]  William J. Dally,et al.  Deadlock-Free Adaptive Routing in Multicomputer Networks Using Virtual Channels , 1993, IEEE Trans. Parallel Distributed Syst..

[5]  Torsten Hoefler,et al.  The PERCS High-Performance Interconnect , 2010, 2010 18th IEEE Symposium on High Performance Interconnects.

[6]  Timothy Mark Pinkston,et al.  Distributed resolution of network congestion and potential deadlock using reservation-based scheduling , 2005, IEEE Transactions on Parallel and Distributed Systems.

[7]  William J. Dally,et al.  Flattened butterfly: a cost-efficient topology for high-radix networks , 2007, ISCA '07.

[8]  G. Pfister,et al.  Solving Hot Spot Contention Using InfiniBand Architecture Congestion Control , 2005 .

[9]  Gregory F. Pfister,et al.  “Hot spot” contention and combining in multistage interconnection networks , 1985, IEEE Transactions on Computers.

[10]  Antonio Robles,et al.  A Scalable and Early Congestion Management Mechanism for MINs , 2010, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing.

[11]  John Kim,et al.  High Performance Datacenter Networks: Architectures, Algorithms, and Opportunities , 2011, High Performance Datacenter Networks: Architectures, Algorithms, and Opportunities.

[12]  Stephen W. Keckler,et al.  Regional congestion awareness for load balance in networks-on-chip , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[13]  Antonio Robles,et al.  Congestion Management in MINs through Marked and Validated Packets , 2007, 15th EUROMICRO International Conference on Parallel, Distributed and Network-Based Processing (PDP'07).

[14]  Junda Liu,et al.  Multi-enterprise networking , 2000 .

[15]  William J. Dally,et al.  The BlackWidow High-Radix Clos Network , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[16]  Nan Jiang,et al.  A detailed and flexible cycle-accurate Network-on-Chip simulator , 2013, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[17]  José Duato,et al.  A new scalable and cost-effective congestion management strategy for lossless multistage interconnection networks , 2005, 11th International Symposium on High-Performance Computer Architecture.

[18]  José Duato,et al.  An Effective and Feasible Congestion Management Technique for High-Performance MINs with Tag-Based Distributed Routing , 2013, IEEE Transactions on Parallel and Distributed Systems.

[19]  John Kim,et al.  Overcoming far-end congestion in large-scale networks , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[20]  Nan Jiang,et al.  Network endpoint congestion control for fine-grained communication , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[21]  Natalie D. Enright Jerger,et al.  DBAR: An efficient routing algorithm to support multiple concurrent applications in networks-on-chip , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[22]  Nick McKeown,et al.  A Starvation-free Algorithm For Achieving 100% Throughput in an Input- Queued Switch , 1999 .

[23]  William J. Dally Virtual-channel flow control , 1990, ISCA '90.

[24]  William J. Dally,et al.  GOAL: a load-balanced adaptive routing algorithm for torus networks , 2003, ISCA '03.

[25]  Mateo Valero,et al.  Efficient Routing Mechanisms for Dragonfly Networks , 2013, 2013 42nd International Conference on Parallel Processing.

[26]  Antonio Robles,et al.  On the Influence of the Packet Marking and Injection Control Schemes in Congestion Management for MINs , 2008, Euro-Par.

[27]  Mateo Valero,et al.  Contention-Based Nonminimal Adaptive Routing in High-Radix Networks , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.

[28]  Jose Renato Santos,et al.  End-to-end congestion control for infiniband , 2003, IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428).

[29]  John A. Gunnels,et al.  Beyond homogeneous decomposition: scaling long-range forces on Massively Parallel Systems , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[30]  Nan Jiang,et al.  Network congestion avoidance through Speculative Reservation , 2012, IEEE International Symposium on High-Performance Comp Architecture.

[31]  Nan Jiang,et al.  Channel reservation protocol for over-subscribed channels and destinations , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[32]  Mike Higgins,et al.  Cray Cascade: A scalable HPC system based on a Dragonfly network , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[33]  William J. Dally,et al.  Microarchitecture of a high radix router , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[34]  William J. Dally,et al.  Flit-reservation flow control , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).

[35]  Ali Pinar,et al.  A Simulator for Large-Scale Parallel Computer Architectures , 2010, Int. J. Distributed Syst. Technol..

[36]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .

[37]  William J. Dally,et al.  Cost-Efficient Dragonfly Topology for Large-Scale Systems , 2009, IEEE Micro.