lntegrating Message-Passing with Vector Architectures

Vector architecures proride excellent computational throughput, while successfully tolerating memory latency by pipelining memory accesses. In this paper, we propose a generalization of vector architectures to message-passing multicomputers, which combines the efficiency of vector computation with the scalablity of distributed-memory systems. In our proposed architecture, each node is a conventional vector processor (with chaining capability and pipelined functional units) augmented by native instructions to send and receive messages through vector registers. In this scheme, inter-node communication can be performed via vector-send/receive instructions, gaining the benefits of communication pipelining, reduced memory copies (memory-to-repter-to-register instead of memory-to-memory-to-cache), and lower communication latency (due to tight processor-communication coupling). We show that this strong integration between functional and communication units can lead to substantial performance improvement over conventional message-passing multicomputers. We model pipelined computation-communication systems both analytically and with a detailed construction-level simulation, and compare this simulation data with empirical results from an Intel Paragon. Preliminary data from a matrix multiplication example indicates our proposed vector-parallel architecture often significant scalability benefits over existing message-passing systems.

[1]  C. Mendes Extending DLXsim for Parallel Architectures , 1994, Symposium on Computer Architecture and High Performance Computing.

[2]  Moriyuki Takamura,et al.  Architecture of the VPPSOO Parallel Supercomputer , 1994 .

[3]  James R. Larus,et al.  The Wisconsin Wind Tunnel: virtual prototyping of parallel computers , 1993, SIGMETRICS '93.

[4]  Shlomo Weiss,et al.  Optimizing a superscalar machine to run vector code , 1993, IEEE Parallel & Distributed Technology: Systems & Applications.

[5]  Robert J. Harrison,et al.  Massively parallel vs. parallel vector supercomputers: a user's perspective (panel) , 1993, SC.

[6]  James E. Smith,et al.  A study of partitioned vector register files , 1992, Proceedings Supercomputing '92.

[7]  Seth Copen Goldstein,et al.  Active Messages: A Mechanism for Integrated Communication and Computation , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[8]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .