Increasing network bandwidth on meshes

In bandwidth limited computers, such as meshes and tori, it is important to achieve high bandwidth across the bisection. Traditional techniques achieve bandwidth in the range of 30–70%. We show how to use barriers, in particular Integrated Network Barriers to achieve high bandwidth utilization which is arbitrarily close to 100%. This technique also provides low latency and fairness to processors. Moreover, it works globally and therefore is not dependent on local approximations of network traffic.

[1]  Jorge L. C. Sanz,et al.  A Simple Mechanism for Efficient Barrier Synchronization in MIMD Machines , 1990, ICPP.

[2]  S. L. Scott,et al.  Using feedback to control tree saturation in multistage interconnection networks , 1989, ISCA '89.

[3]  Abhiram G. Ranade,et al.  How to emulate shared memory (Preliminary Version) , 1987, FOCS.

[4]  Allan Porterfield,et al.  The Tera computer system , 1990 .

[5]  Lionel M. Ni,et al.  The turn model for adaptive routing , 1992, ISCA '92.

[6]  Franco P. Preparata,et al.  Horizons of Parallel Computation , 1992, 25th Anniversary of INRIA.

[7]  W. Daniel Hillis,et al.  The connection machine , 1985 .

[8]  Abhiram G. Ranade,et al.  How to emulate shared memory , 1991, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[9]  Leonard Kleinrock,et al.  Virtual Cut-Through: A New Computer Communication Switching Technique , 1979, Comput. Networks.

[10]  Suresh Chalasani,et al.  A comparison of adaptive wormhole routing algorithms , 1993, ISCA '93.

[11]  Jon A. Solworth,et al.  Integrated Network Barriers for D-Dimensional Meshes , 1993, Architectures and Compilation Techniques for Fine and Medium Grain Parallelism.

[12]  Mohan Ahuja Flush Primitives for Asynchronous Distributed Systems , 1990, Inf. Process. Lett..

[13]  Manoj Kumar,et al.  The Onset of Hot-Spot Contention , 1986, ICPP.

[14]  Alain J. Martin,et al.  The architecture and programming of the Ametek series 2010 multicomputer , 1988, C3P.

[15]  Michael D. Noakes,et al.  The J-machine multicomputer: an architectural evaluation , 1993, ISCA '93.

[16]  S. Lam,et al.  Congestion Control of Store-and-Forward Networks by Input Buffer Limits - An Analysis , 1979, IEEE Transactions on Communications.

[17]  Burton J. Smith Architecture And Applications Of The HEP Multiprocessor Computer System , 1982, Optics & Photonics.