Flattened butterfly: a cost-efficient topology for high-radix networks

Increasing integrated-circuit pin bandwidth has motivateda corresponding increase in the degree or radix of interconnection networksand their routers. This paper introduces the flattened butterfly, a cost-efficient topology for high-radix networks. On benign (load-balanced) traffic, the flattened butterfly approaches the cost/performance of a butterfly network and has roughly half the cost of a comparable performance Clos network.The advantage over the Clos is achieved by eliminating redundant hopswhen they are not needed for load balance. On adversarial traffic, the flattened butterfly matches the cost/performance of a folded-Clos network and provides an order of magnitude better performance than a conventional butterfly.In this case, global adaptive routing is used to switchthe flattened butterfly from minimal to non-minimal routing - usingredundant hops only when they are needed. Minimal and non-minimal, oblivious and adaptive routing algorithms are evaluated on the flattened butterfly.We show that load-balancing adversarial traffic requires non-minimalglobally-adaptive routing and show that sequential allocators are required to avoid transient load imbalance when using adaptive routing algorithms.We also compare the cost of the flattened butterfly to folded-Clos, hypercube,and butterfly networks with identical capacityand show that the flattened butterfly is more cost-efficient thanfolded-Clos and hypercube topologies.

[1]  Charles Clos,et al.  A study of non-blocking switching networks , 1953 .

[2]  Howard Jay Siegel,et al.  A Model of SIMD Machines and a Comparison of Various Interconnection Networks , 1979, IEEE Transactions on Computers.

[3]  Leslie G. Valiant,et al.  A Scheme for Fast Parallel Communication , 1982, SIAM J. Comput..

[4]  Marc Snir,et al.  The Performance of Multistage Interconnection Networks for Multiprocessors , 1983, IEEE Transactions on Computers.

[5]  Dharma P. Agrawal,et al.  Generalized Hypercube and Hyperbus Structures for a Computer Network , 1984, IEEE Transactions on Computers.

[6]  Charles E. Leiserson,et al.  Fat-trees: Universal networks for hardware-efficient supercomputing , 1985, IEEE Transactions on Computers.

[7]  William J. Dally,et al.  Virtual-channel flow control , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[8]  William J. Dally,et al.  Performance Analysis of k-Ary n-Cube Interconnection Networks , 1987, IEEE Trans. Computers.

[9]  Sudhakar Yalamanchili,et al.  Adaptive routing in generalized hypercube architectures , 1991, Proceedings of the Third IEEE Symposium on Parallel and Distributed Processing.

[10]  Sanjeev Saxena,et al.  On Parallel Prefix Computation , 1994, Parallel Process. Lett..

[11]  Steven L. Scott,et al.  The Cray T3E Network: Adaptive Routing in a High Performance 3D Torus , 1996 .

[12]  Sudhakar Yalamanchili,et al.  Power constrained design of multiprocessor interconnection networks , 1997, Proceedings International Conference on Computer Design VLSI in Computers and Processors.

[13]  D. Lenoski,et al.  The SGI Origin: A ccnuma Highly Scalable Server , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[14]  Henry G. Dietz,et al.  Compiler Techniques for Flat Neighborhood Networks , 2000, LCPC.

[15]  J. Wei,et al.  A 0.4-4 Gb/s CMOS quad transceiver cell using on-chip regulated dual-loop PLLs , 2002, 2002 Symposium on VLSI Circuits. Digest of Technical Papers (Cat. No.02CH37302).

[16]  Mahmut T. Kandemir,et al.  Energy optimization techniques in cluster interconnects , 2003, ISLPED '03.

[17]  William J. Dally,et al.  GOAL: a load-balanced adaptive routing algorithm for torus networks , 2003, ISCA '03.

[18]  Li Shang,et al.  Dynamic voltage scaling with links for power optimization of interconnection networks , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[19]  Power-driven Design of Router Microarchitectures in On-chip Networks , 2003, MICRO.

[20]  A 27-mW 3.6-gb/s I/O transceiver , 2004, IEEE Journal of Solid-State Circuits.

[21]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .

[22]  Li-Shiuan Peh,et al.  Design-space exploration of power-aware on/off interconnection networks , 2004, IEEE International Conference on Computer Design: VLSI in Computers and Processors, 2004. ICCD 2004. Proceedings..

[23]  William J. Dally,et al.  Microarchitecture of a high radix router , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[24]  William J. Dally,et al.  Adaptive Routing in High-Radix Clos Network , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[25]  William J. Dally,et al.  The BlackWidow High-Radix Clos Network , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).