Vector architecures proride excellent computational throughput, while successfully tolerating memory latency by pipelining memory accesses. In this paper, we propose a generalization of vector architectures to message-passing multicomputers, which combines the efficiency of vector computation with the scalablity of distributed-memory systems. In our proposed architecture, each node is a conventional vector processor (with chaining capability and pipelined functional units) augmented by native instructions to send and receive messages through vector registers. In this scheme, inter-node communication can be performed via vector-send/receive instructions, gaining the benefits of communication pipelining, reduced memory copies (memory-to-repter-to-register instead of memory-to-memory-to-cache), and lower communication latency (due to tight processor-communication coupling). We show that this strong integration between functional and communication units can lead to substantial performance improvement over conventional message-passing multicomputers. We model pipelined computation-communication systems both analytically and with a detailed construction-level simulation, and compare this simulation data with empirical results from an Intel Paragon. Preliminary data from a matrix multiplication example indicates our proposed vector-parallel architecture often significant scalability benefits over existing message-passing systems.
[1]
C. Mendes.
Extending DLXsim for Parallel Architectures
,
1994,
Symposium on Computer Architecture and High Performance Computing.
[2]
Moriyuki Takamura,et al.
Architecture of the VPPSOO Parallel Supercomputer
,
1994
.
[3]
James R. Larus,et al.
The Wisconsin Wind Tunnel: virtual prototyping of parallel computers
,
1993,
SIGMETRICS '93.
[4]
Shlomo Weiss,et al.
Optimizing a superscalar machine to run vector code
,
1993,
IEEE Parallel & Distributed Technology: Systems & Applications.
[5]
Robert J. Harrison,et al.
Massively parallel vs. parallel vector supercomputers: a user's perspective (panel)
,
1993,
SC.
[6]
James E. Smith,et al.
A study of partitioned vector register files
,
1992,
Proceedings Supercomputing '92.
[7]
Seth Copen Goldstein,et al.
Active Messages: A Mechanism for Integrated Communication and Computation
,
1992,
[1992] Proceedings the 19th Annual International Symposium on Computer Architecture.
[8]
David A. Patterson,et al.
Computer Architecture: A Quantitative Approach
,
1969
.