Dynamically scheduled VLIW processors

VLIW processors are viewed as an attractive way of achieving instruction-level parallelism because of their ability to issue multiple operations per cycle with relatively simple control logic. They are also perceived as being of limited interest as products because of the problem of object code compatibility across processors having different hardware latencies and varying levels of parallelism. The author introduces the concept of delayed split-issue and the dynamic scheduling hardware which, together, solve the compatibility problem for VLIW processors and, in fact, make it possible for such processors to use all of the interlocking and scoreboarding techniques that are known for superscalar processors. >

[1]  J. E. Thornton,et al.  Parallel operation in the control data 6600 , 1964, AFIPS '64 (Fall, part II).

[2]  B. Ramakrishna Rau,et al.  Efficient code generation for horizontal architectures: Compiler techniques and architectural support , 1982, ISCA '82.

[3]  R. D. Groves,et al.  An IBM second generation RISC processor architecture , 1989, Proceedings 1989 IEEE International Conference on Computer Design: VLSI in Computers and Processors.

[4]  Paolo Faraboschi,et al.  Instruction-level parallelism in Prolog: analysis and architectural support , 1992, ISCA '92.

[5]  Yale N. Patt,et al.  Checkpoint repair for out-of-order execution machines , 1987, ISCA '87.

[6]  J. E. Thornton Design of a Computer: The Control Data 6600 , 1970 .

[7]  B. Ramakrishna Rau Dynamic Scheduling Techniques for VLIWProcessors , 1993 .

[8]  Robert P. Colwell,et al.  A VLIW architecture for a trace scheduling compiler , 1987, ASPLOS 1987.

[9]  Alexandru Nicolau,et al.  Parallelizing Programs with Recursive Data Structures , 1989, IEEE Trans. Parallel Distributed Syst..

[10]  Alexandru Nicolau,et al.  Measuring the Parallelism Available for Very Long Instruction Word Architectures , 1984, IEEE Transactions on Computers.

[11]  Joseph A. Fisher,et al.  Trace Scheduling: A Technique for Global Microcode Compaction , 1981, IEEE Transactions on Computers.

[12]  Michael Allen,et al.  Organization of the Motorola 88110 superscalar RISC microprocessor , 1992, IEEE Micro.

[13]  David W. Anderson,et al.  The IBM System/360 model 91: machine philosophy and instruction-handling , 1967 .

[14]  David W. Wall,et al.  Limits of instruction-level parallelism , 1991, ASPLOS IV.

[15]  J. Yetter,et al.  A high speed superscalar PA-RISC processor , 1992, Digest of Papers COMPCON Spring 1992.

[16]  Yale N. Patt,et al.  Checkpoint Repair for High-Performance Out-of-Order Execution Machines , 1987, IEEE Transactions on Computers.

[17]  Burton J. Smith,et al.  A processor architecture for Horizon , 1988, Proceedings. SUPERCOMPUTING '88.

[18]  James E. Smith,et al.  The ZS-1 central processor , 1987, ASPLOS 1987.

[19]  Alan E. Charlesworth,et al.  An Approach to Scientific Array Processing: The Architectural Design of the AP-120B/FPS-164 Family , 1981, Computer.

[20]  Mike Johnson,et al.  Superscalar microprocessor design , 1991, Prentice Hall series in innovative technology.

[21]  B. Ramakrishna Rau,et al.  The Cydram 5 Stride-Insensitive Memory System , 1989, ICPP.

[22]  Alan Jay Smith,et al.  Branch Prediction Strategies and Branch Target Buffer Design , 1995, Computer.

[23]  B. Ramakrishna Rau,et al.  The Cydra 5 departmental supercomputer: design philosophies, decisions, and trade-offs , 1989, Computer.

[24]  Hwa C. Torng,et al.  An Instruction Issuing Approach to Enhancing Performance in Multiple Functional Unit Processors , 1986, IEEE Transactions on Computers.