Fast-Response Dynamic Routing Balancing for high-speed interconnection networks

Communication requirements in High Performance Computing systems demand the use of high-speed Interconnection Networks to connect processing nodes. However, when communication load is unfairly distributed across the network resources, message congestion appears. Congestion spreading increases latency and reduces network throughput causing important performance degradation. The Fast-Response Dynamic Routing Balancing (FR-DRB) is a method developed to perform a uniform balancing of communication load over the interconnection network. FR-DRB distributes the message traffic based on a gradual and load-controlled path expansion. The method monitors network message latency and makes decisions about the number of alternative paths to be used between each source-destination pair for message delivery. FR-DRB performance has been compared with other routing policies under a representative set of traffic patterns which are commonly created by parallel scientific applications. Experiments results show an important improvement in latency and throughput.

[1]  Pedro López,et al.  On the Influence of the Selection Function on the Performance of Fat-Trees , 2006, Euro-Par.

[2]  Jean C. Walrand,et al.  Fair end-to-end window-based congestion control , 2000, TNET.

[3]  Leslie G. Valiant,et al.  Universal schemes for parallel communication , 1981, STOC '81.

[4]  Amith R. Mamidala,et al.  Hot-Spot Avoidance With Multi-Pathing Over InfiniBand: An MPI Perspective , 2007, Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07).

[5]  Daniel Franco,et al.  Dynamic Routing Balancing On InfiniBand Networks * , 2008 .

[6]  Emilio Luque,et al.  Dynamic and Distributed Multipath Routing Policy for High-Speed Cluster Networks , 2009, 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid.

[7]  Daniel Franco,et al.  Dynamic routing balancing on InfiniBand network , 2008 .

[8]  Fabrizio Petrini,et al.  Performance Evaluation of the Quadrics Interconnection Network , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.

[9]  R. Jain Congestion control in computer networks: issues and trends , 1990, IEEE Network.

[10]  H.H.J. Hum,et al.  Polling Watchdog: Combining Polling and Interrupts for Efficient Message Handling , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[11]  José Duato,et al.  RECN-DD: A Memory-Efficient Congestion Management Technique for Advanced Switching , 2006, 2006 International Conference on Parallel Processing (ICPP'06).

[12]  William J. Dally,et al.  Globally Adaptive Load-Balanced Routing on Tori , 2004, IEEE Computer Architecture Letters.

[13]  Irfan-Ullah Awan,et al.  An Enhanced Congestion Control Mechanism in InfiniBand Networks for High Performance Computing Systems , 2006, 20th International Conference on Advanced Information Networking and Applications - Volume 1 (AINA'06).

[14]  Lionel M. Ni,et al.  The turn model for adaptive routing , 1992, ISCA '92.

[15]  Pedro López,et al.  A family of mechanisms for congestion control in wormhole networks , 2005, IEEE Transactions on Parallel and Distributed Systems.

[16]  Sudhakar Yalamanchili,et al.  Interconnection Networks: An Engineering Approach , 2002 .