The Adaptive Bubble Router

The design of a new adaptive virtual cut-through router for torus networks is presented in this paper. With much lower VLSI costs than adaptive wormhole routers, the adaptive Bubble router is even faster than deterministic wormhole routers based on virtual channels. This has been achieved by combining a low-cost deadlock avoidance mechanism for virtual cut-through networks, called Bubble flow control, with an adequate design of the router's arbiter. A thorough methodology has been employed to quantify the impact that this router design has at all levels, from its hardware cost to the system performance when running parallel applications. At the VLSI level, our proposal is the adaptive router with the shortest clock cycle and node delay when compared with other state-of-the-art alternatives. This translates into the lowest latency and highest throughput under standard synthetic loads. At system level, these gains reduce the execution time of the benchmarks considered. Compared with current adaptive wormhole routers, the execution time is reduced by up to 27%. Furthermore, this is the only router that improves system performance when compared with simpler static designs.

[1]  Fabrizio Petrini,et al.  Minimal adaptive routing with limited injection on Toroidal k-ary n-cubes , 1996, Supercomputing '96.

[2]  José Duato,et al.  Adaptive bubble router: a design to improve performance in torus networks , 1999, Proceedings of the 1999 International Conference on Parallel Processing.

[3]  Pedro López,et al.  Performance Evaluation of Adaptive Routing Algorithms for k-ary-n-cubes , 1994, PCRCW.

[4]  Leonard Kleinrock,et al.  Virtual Cut-Through: A New Computer Communication Switching Technique , 1979, Comput. Networks.

[5]  José Duato,et al.  A New Theory of Deadlock-Free Adaptive Routing in Wormhole Networks , 1993, IEEE Trans. Parallel Distributed Syst..

[6]  William J. Dally,et al.  Deadlock-Free Message Routing in Multiprocessor Interconnection Networks , 1987, IEEE Transactions on Computers.

[7]  Yuval Tamir,et al.  Symmetric Crossbar Arbiters for VLSI Communication Switches , 1993, IEEE Trans. Parallel Distributed Syst..

[8]  Trevor Mudge,et al.  Proceedings of the 24th annual international symposium on Computer architecture , 1997 .

[9]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[10]  R. Beivide,et al.  Restricted Injection Flow Control for �-ary-cube Networks , .

[11]  Anant Agarwal,et al.  Limits on Interconnection Network Performance , 1991, IEEE Trans. Parallel Distributed Syst..

[12]  Sarita V. Adve,et al.  RSIM: An Execution-Driven Simulator for ILP-Based Shared-Memory Multiprocessors and Uniprocessors , 1997 .

[13]  José Duato,et al.  A Necessary and Sufficient Condition for Deadlock-Free Routing in Cut-Through and Store-and-Forward Networks , 1996, IEEE Trans. Parallel Distributed Syst..

[14]  Steven L. Scott,et al.  The Cray T3E Network: Adaptive Routing in a High Performance 3D Torus , 1996 .

[15]  Daniel H. Linder,et al.  An Adaptive and Fault Tolerant Wormhole Routing Strategy for k-Ary n-Cubes , 1994, IEEE Trans. Computers.

[16]  Andrew A. Chien,et al.  Planar-adaptive routing: low-cost adaptive networks for multiprocessors , 1992, ISCA '92.

[17]  A. A. Chein,et al.  A cost and speed model for k-ary n-cube wormhole routers , 1998 .

[18]  Allan Gottlieb Proceedings of the 19th Annual International Symposium on Computer Architecture. Gold Coast, Australia, May 1992 , 1992, ISCA.

[19]  T.M. Pinkston,et al.  On Deadlocks In Interconnection Networks , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[20]  W. E Nagel 1988 International conference on supercomputing , 1988 .

[21]  D. Lenoski,et al.  The SGI Origin: A ccnuma Highly Scalable Server , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[22]  Cruz Izu,et al.  Low-level router design and its impact on supercomputer system performance , 1999, ICS '99.

[23]  Ii A.R. Larzelere,et al.  Creating simulation capabilities , 1998 .

[24]  Carmen Carrión,et al.  A flow control mechanism to avoid message deadlock in k-ary n-cube networks , 1997, Proceedings Fourth International Conference on High-Performance Computing.

[25]  Andrew A. Chien,et al.  A Cost and Speed Model for k-ary n-Cube Wormhole Routers , 1998, IEEE Trans. Parallel Distributed Syst..

[26]  William J. Dally,et al.  Performance Analysis of k-Ary n-Cube Interconnection Networks , 1987, IEEE Trans. Computers.