Adaptive Routing in High-Radix Clos Network

Recent increase in the pin bandwidth of integrated-circuits has motivated an increase in the degree or radix of interconnection network routers. The folded-Clos network can take advantage of these high-radix routers and this paper investigates adaptive routing in such networks. We show that adaptive routing, if done properly, outperforms oblivious routing by providing lower latency, lower latency variance, and higher throughput with limited buffering. Adaptive routing is particularly useful in load balancing around nonuniformities caused by deterministically routed traffic or the presence of faults in the network. We evaluate alternative allocation algorithms used in adaptive routing and compare their performance. The use of randomization in the allocation algorithms can simplify the implementation while sacrificing minimal performance. The cost of adaptive routing, in terms of router latency and area, is increased in high-radix routers. We show that the use of imprecise queue information reduces the implementation complexity and precomputation of the allocations minimizes the impact of adaptive routing on router latency

[1]  Eli Upfal,et al.  Balanced Allocations , 1999, SIAM J. Comput..

[2]  Steven L. Scott,et al.  The Cray T3E Network: Adaptive Routing in a High Performance 3D Torus , 1996 .

[3]  Fabrizio Petrini,et al.  k-ary n-trees: high performance networks for massively parallel architectures , 1997, Proceedings 11th International Parallel Processing Symposium.

[4]  Charles Clos,et al.  A study of non-blocking switching networks , 1953 .

[5]  Samuel P. Morgan,et al.  Input Versus Output Queueing on a Space-Division Packet Switch , 1987, IEEE Trans. Commun..

[6]  W. Daniel Hillis,et al.  The Network Architecture of the Connection Machine CM-5 , 1996, J. Parallel Distributed Comput..

[7]  Dennis G. Shea,et al.  The SP2 High-Performance Switch , 1995, IBM Syst. J..

[8]  A. A. Chein,et al.  A cost and speed model for k-ary n-cube wormhole routers , 1998 .

[9]  Michael Mitzenmacher,et al.  The Power of Two Choices in Randomized Load Balancing , 2001, IEEE Trans. Parallel Distributed Syst..

[10]  Steven Heller,et al.  Congestion-Free Routing on the CM-5 Data Router , 1994, PCRCW.

[11]  William J. Dally,et al.  The BlackWidow High-Radix Clos Network , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[12]  William J. Dally,et al.  Microarchitecture of a high radix router , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[13]  Charles E. Leiserson,et al.  Fat-trees: Universal networks for hardware-efficient supercomputing , 1985, IEEE Transactions on Computers.

[14]  William J. Dally,et al.  Performance Analysis of k-Ary n-Cube Interconnection Networks , 1987, IEEE Trans. Computers.

[15]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .

[16]  Odysseas I. Pentakalos An Introduction to the InfiniBand Architecture , 2002, Int. CMG Conference.

[17]  Suresh Chalasani,et al.  A comparison of adaptive wormhole routing algorithms , 1993, ISCA '93.

[18]  Craig B. Stunkel,et al.  Adaptive source routing in multistage interconnection networks , 1996, Proceedings of International Conference on Parallel Processing.

[19]  Andrew A. Chien,et al.  A Cost and Speed Model for k-ary n-Cube Wormhole Routers , 1998, IEEE Trans. Parallel Distributed Syst..

[20]  William J. Dally,et al.  Virtual-channel flow control , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.