A theory of wormhole routing in parallel computers

Virtually all theoretical work on message routing in parallel computers has dwelt on packet routing: messages are conveyed as packets, an entire packet can reside at a node of the network, and a packet is sent from the queue of one node to the queue of another node until its reaches its destination. The current trend in multicomputer architecture, however, is to use wormhole routing. In wormhole routing a message is transmitted as a contiguous stream of bits, physically occupying a sequence of nodes/edges in the network. Thus, a message resembles a worm burrowing through the network. The authors give theoretical analyses of simple wormhole routing algorithms, showing them to be nearly optimal for butterfly and mesh connected networks. The analysis requires initial random delays in injecting messages to the network. They report simulation results suggesting that the idea of random initial delays is not only useful for theoretical analysis but may actually improve the performance of wormhole routing algorithms.<<ETX>>

[1]  José Duato,et al.  On the Design of Deadlock-Free Adaptive Routing Algorithms for Multicomputers: Theoretical Aspects , 1991, EDMCC.

[2]  Eli Upfal,et al.  Efficient schemes for parallel communication , 1982, PODC '82.

[3]  Nicholas Pippenger,et al.  Parallel Communication with Limited Buffers (Preliminary Version) , 1984, FOCS.

[4]  William J. Dally,et al.  Deadlock-Free Message Routing in Multiprocessor Interconnection Networks , 1987, IEEE Transactions on Computers.

[5]  Romas Aleliunas,et al.  Randomized parallel communication (Preliminary Version) , 1982, PODC '82.

[6]  Debasis Mitra,et al.  Randomized Parallel Communications , 1986, ICPP.

[7]  H. Chernoff A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations , 1952 .

[8]  Alain J. Martin,et al.  The architecture and programming of the Ametek series 2010 multicomputer , 1988, C3P.

[9]  Bruce M. Maggs,et al.  Fast algorithms for bit-serial routing on a hypercube , 1990, SPAA '90.

[10]  Abhiram G. Ranade,et al.  How to emulate shared memory (Preliminary Version) , 1987, FOCS.

[11]  W. J. Dally,et al.  Finite-grain message passing concurrent computers , 1988, C3P.

[12]  Leslie G. Valiant,et al.  Universal schemes for parallel communication , 1981, STOC '81.

[13]  Michael D. Noakes,et al.  System design of the J-Machine , 1990 .

[14]  Leslie G. Valiant,et al.  A Scheme for Fast Parallel Communication , 1982, SIAM J. Comput..

[15]  Fillia Makedon,et al.  On bit-serial packet routing for the mesh and the torus , 1990, [1990 Proceedings] The Third Symposium on the Frontiers of Massively Parallel Computation.

[16]  William J. Dally,et al.  Performance Analysis of k-Ary n-Cube Interconnection Networks , 1987, IEEE Trans. Computers.

[17]  William C. Athas Physically compact, high-performance multicomputers , 1990 .

[18]  Eli Upfal,et al.  An O(log N) deterministic packet-routing scheme , 1992, JACM.

[19]  Uriel Feige,et al.  Exact analysis of hot-potato routing , 1992, Proceedings., 33rd Annual Symposium on Foundations of Computer Science.

[20]  Frank Thomson Leighton,et al.  Average case analysis of greedy routing algorithms on arrays , 1990, SPAA '90.