Issues in the Design of High Performance SIMD Architectures

In this paper, we consider the design of high performance SIMD architectures. We examine three mechanisms by which the performance of this class of machines may be improved, and which have been largely unexplored by the SIMD community. The mechanisms are pipelined instruction broadcast, pipelining of the PE architecture, and the introduction of a novel memory hierarchy in the PE address space which we denote the direct only data cache, (dod-cache). For each of the performance improvements, we develop analytical models of the potential speedup, and apply those models to real program traces obtained on a MasPar MP-2 system. In addition, we consider the impact of all improvements taken together.

[1]  Martin C. Herbordt,et al.  Experimental Analysis of Some SIMD Array Memory Hierarchies , 1995, ICPP.

[2]  Mitch Alsup Motorola's 88000 family architecture , 1990, IEEE Micro.

[3]  John R. Nickolls,et al.  The design of the MasPar MP-1: a cost effective massively parallel computer , 1990, Digest of Papers Compcon Spring '90. Thirty-Fifth IEEE Computer Society International Conference on Intellectual Leverage.

[4]  Kenneth E. Batcher,et al.  Design of a Massively Parallel Processor , 1980, IEEE Transactions on Computers.

[5]  W. Daniel Hillis,et al.  The connection machine , 1985 .

[6]  Todd Elliot Rockoff An Analysis of Instruction-Cached SIMD Computer Architecture , 1994 .

[7]  Donald B. Alpert,et al.  Architecture of the Pentium microprocessor , 1993, IEEE Micro.

[8]  Lawrence Snyder,et al.  Architectural tradeoffs in parallel computer design , 1989 .

[9]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[10]  Edward W. Davis,et al.  BLITZEN: a highly integrated massively parallel machine , 1988, Proceedings., 2nd Symposium on the Frontiers of Massively Parallel Computation.

[11]  Leonard Kleinrock,et al.  The virtual-time data-parallel machine , 1992, [Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation.

[12]  Ahmed Sameh,et al.  The Illiac IV system , 1972 .

[13]  Michael J. Flynn,et al.  An area model for on-chip memories and its application , 1991 .

[14]  Michael Allen,et al.  Organization of the Motorola 88110 superscalar RISC microprocessor , 1992, IEEE Micro.

[15]  Peter Thomas Highnam Systems and programming issues in the design and use of a SIMD linear array for image processing , 1991 .

[16]  Tom Blank,et al.  The MasPar MP-1 architecture , 1990, Digest of Papers Compcon Spring '90. Thirty-Fifth IEEE Computer Society International Conference on Intellectual Leverage.

[17]  Manoj Kumar,et al.  Unique design concepts in GF11 and their impact on performance , 1992, IBM J. Res. Dev..