An Instruction Issuing Approach to Enhancing Performance in Multiple Functional Unit Processors

Processors with multiple functional units, such as CRAY-1, Cyber 205, and FPS 164, have been used for high-end scientific computation tasks. Much effort has been put into increasing the throughput of such systems. One critical consideration in their design is the identification and implementation of a suitable instruction issuing scheme. Existing approaches do not issue enough instructions per machine cycle to fully utilize the functional units and realize the high-performance level achievable with these powerful execution resources.

[1]  James E. Smith,et al.  Instruction Issue Logic in Pipelined Supercomputers , 1984, IEEE Trans. Computers.

[2]  J. E. Thornton Design of a Computer: The Control Data 6600 , 1970 .

[3]  Donald D. Chamberlin The "single-assignment" approach to parallel processing , 1972, AFIPS '71 (Fall).

[4]  R. M. Tomasulo,et al.  An efficient algorithm for exploiting multiple arithmetic units , 1995 .

[5]  George Radin,et al.  The 801 minicomputer , 1982, ASPLOS I.

[6]  David J. Kuck,et al.  A Survey of Parallel Machine Organization and Programming , 1977, CSUR.

[7]  Michael J. Flynn,et al.  Detection and Parallel Execution of Independent Instructions , 1970, IEEE Transactions on Computers.

[8]  Chuan-lin Wu Interconnection Networks - Guest Editor's Introduction , 1981, Computer.

[9]  David W. Anderson,et al.  The IBM System/360 model 91: machine philosophy and instruction-handling , 1967 .

[10]  Christopher W. Fraser,et al.  Eliminating redundant object code , 1982, POPL '82.

[11]  Shlomo Weiss,et al.  Instruction issue logic for pipelined supercomputers , 1984, ISCA 1984.

[12]  Richard M. Russell,et al.  The CRAY-1 computer system , 1978, CACM.

[13]  Ramon Delfin Acosta Evaluation, implementation, and enhancement of the dispatch stack instruction issuing mechanism (computer, architecture, scheduling) , 1985 .

[14]  W. M. McKeeman,et al.  Peephole optimization , 1965, CACM.

[15]  Garold Stephen Tjaden Representation and detection of concurrency using ordering-matrices. , 1972 .

[16]  유관종,et al.  Supercomputing , 2018, Communications in Computer and Information Science.

[17]  Yoichi Muraoka,et al.  On the Number of Operations Simultaneously Executable in Fortran-Like Programs and Their Resulting Speedup , 1972, IEEE Transactions on Computers.

[18]  Robert M. Keller,et al.  Look-Ahead Processors , 1975, CSUR.

[19]  Alfred V. Aho,et al.  Principles of Compiler Design (Addison-Wesley series in computer science and information processing) , 1977 .

[20]  Alan E. Charlesworth,et al.  An Approach to Scientific Array Processing: The Architectural Design of the AP-120B/FPS-164 Family , 1981, Computer.

[21]  Peter M. Kogge,et al.  The Architecture of Pipelined Computers , 1981 .

[22]  Carlo H. Séquin,et al.  A VLSI RISC , 1982, Computer.

[23]  C. V. Ramamoorthy,et al.  Pipeline Architecture , 1977, CSUR.

[24]  Vason P. Srini,et al.  Analysis of Cray-1S architecture , 1983, ISCA '83.

[25]  Robert G. Wedig Detection of concurrency in directly executed language instruction streams , 1982 .

[26]  Richard P. Brent,et al.  The Parallel Evaluation of General Arithmetic Expressions , 1974, JACM.