论文信息 - DSNS (dynamically-hazard-resolved statically-code-scheduled, nonuniform superscalar): yet another superscalar processor architecture

DSNS (dynamically-hazard-resolved statically-code-scheduled, nonuniform superscalar): yet another superscalar processor architecture

A new superscalar processor architecture, called DSNS (Dynamically-hazard-resolved, Statically-code-scheduled, Nonuniform Superscalar), is proposed and discussed. DSNS has the following major architectural features.1. Dynamically-hazard-resolved superscalar: DSNS is object-code compatible with respect to the degree of superscalar. Pipeline interlock hardware should be provided for detecting and resolving hazards at run time.2. Statically-cade-scheduled superscalar: The performance of DSNS could not be scalable with respect to the degree of superscalar. Compilers must be responsible for scheduling instructions to reduce the pipeline stalls for a particular degree of superscalar.3. Nonuniform superscalar: Although nonuniform superscalar potentially suffers instruction-class conflicts, it can be more cost-effective than uniform superscalar. Again compilers must take care that the class conflicts do not increase structural hazards.4. Static memory disambiguation: The DSNS architecture provides three types of LOAD/STORE instructions; strongly ordered, weakly ordered, and unordered. Memory disambiguation at compile time is responsible for marking each LOAD/STORE instruction. At run time, processors need not detect nor resolve data hazards for every type; they just perform memory accesses inorder for strongly or weakly ordered instructions, and arbitrarily for unordered.5. Static branch prediction with branch-target buffer: Branch instructions predicted as taken by compilers are stored in the branch target buffer. Hardware never guesses the outcomes of branch instructions.6. Early branch resolution with advanced conditioning: Advanced conditioning allows branch decisions to precede further the corresponding branches. It reduces the branch delay and results in resolving branches early in the pipeline.7. Conditional mode execution with dual register files: Dual register file facilitates maintaining the precise machine state that otherwise might be violated by speculative execution such as conditional mode.8. Weakly precise interrupts: The DSNS architecture defines interrupts as being somewhat imprecise but restartable with the help of interrupt handlers. The definition alleviates hardware constraints for ensuring precise interrupts strongly.This paper also presents an implementation of the DSNS architecture. The DSNS processor prototype under development is a four-stage pipelined processor of superscalar-degree four. The instruction pipelines, especially the branch pipeline, are discussed in detail.

[1] Michael D. Smith,et al. Boosting beyond static scheduling in a superscalar processor , 1990, ISCA '90.

[2] Norman P. Jouppi,et al. Available instruction-level parallelism for superscalar and superpipelined machines , 1989, ASPLOS 1989.

[3] Kazuaki Murakami,et al. SIMP (single Instruction Stream/multiple Instruction Pipelining): A Novel High-speed Single-processor Architecture , 1989, The 16th Annual International Symposium on Computer Architecture.

[4] Jerry Avorn. Technology , 1929, Nature.

[5] William M. Johnson,et al. Super-scalar processor design , 1989 .

[6] G. G. Stokes. "J." , 1890, The New Yale Book of Quotations.

[7] Norman P. Jouppi,et al. The Nonuniform Distribution of Instruction-Level and Machine Parallelism and Its Effect on Performance , 1989, IEEE Trans. Computers.

[8] S SohiGurindar. Instruction Issue Logic for High-Performance, Interruptible, Multiple Functional Unit, Pipelined Computers , 1990 .

[9] Yale N. Patt,et al. Checkpoint Repair for High-Performance Out-of-Order Execution Machines , 1987, IEEE Transactions on Computers.

[10] James E. Smith,et al. Instruction Issue Logic in Pipelined Supercomputers , 1984, IEEE Trans. Computers.

[11] J. E. Thornton,et al. Parallel operation in the control data 6600 , 1964, AFIPS '64 (Fall, part II).

[12] Michael J. Flynn,et al. Very high-speed computing systems , 1966 .

[13] S. McFarling,et al. Reducing the cost of branches , 1986, ISCA '86.

[14] Alan Jay Smith,et al. Branch Prediction Strategies and Branch Target Buffer Design , 1995, Computer.

[15] D. J. Lalja,et al. Reducing the branch penalty in pipelined processors , 1988, Computer.

[16] Michael D. Smith,et al. Limits on multiple instruction issue , 1989, ASPLOS 1989.

[17] E. S. Lee,et al. Performance enhancement of SISD processors , 1979, ISCA '79.

[18] Andrew R. Pleszkun,et al. Implementing Precise Interrupts in Pipelined Processors , 1988, IEEE Trans. Computers.

[19] James E. Smith,et al. Dynamic instruction scheduling and the Astronautics ZS-1 , 1989, Computer.

[20] Alexander Aiken,et al. A Development Environment for Horizontal Microcode , 1986, IEEE Trans. Software Eng..

[21] R. M. Tomasulo,et al. An efficient algorithm for exploiting multiple arithmetic units , 1995 .

[22] Joseph A. Fisher,et al. Trace Scheduling: A Technique for Global Microcode Compaction , 1981, IEEE Transactions on Computers.

[23] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .

[24] N. Irie,et al. SIMP (Single Instruction stream/Multiple instruction Pipelining): a novel high-speed single-processor architecture , 1989, ISCA '89.

[25] Shlomo Weiss,et al. Instruction issue logic for pipelined supercomputers , 1984, ISCA 1984.

[26] Sanjay M. Krishnamurthy,et al. A brief survey of papers on scheduling for pipelined processors , 1990, SIGP.