论文信息 - WISQ: a restartable architecture using queues - 字舞流文

WISQ: a restartable architecture using queues

In this paper, the WISQ architecture is described. This architecture is designed to achieve high performance by exploiting new compiler technology and using a highly segmented pipeline. By having a highly segmented pipeline, a very-high-speed clock can be used. Since a highly segmented pipeline will require relatively long pipelines, a way must be provided to minimize the effects of pipeline bubbles that are formed due to data and control dependencies. It is also important to provide a way of supporting precise interrupts. These goals are met, in part, by providing a reorder buffer to help restore the machine to a precise state. The architecture then makes the pipelining visible to the programmer/compiler by making the reorder buffer accessible and by explicitly providing that issued instructions cannot be affected by immediately preceding ones. Compiler techniques have been identified that can take advantage of the reorder buffer and permit a sustained execution rate approaching or exceeding one per clock. These techniques include using trace scheduling and providing a relatively easy way to “undo” instructions if the predicted branch path is not taken. We have also studied ways to further reduce the effects of branches by not having them executed in the execution unit. In particular, branches are detected and resolved in the instruction fetch unit. Using this approach, the execution unit is sent a stream of instructions (without branches) that are guaranteed to execute.

Andrew R. Pleszkun | Philip J. Woest | James R. Goodman | Wei-Chung Hsu | P. B. Schechter | George E. Bier | R. T. Joersz | J. Goodman | A. Pleszkun | P. J. Woest | W. Hsu | R. Joersz

[1] David W. Anderson,et al. The IBM System/360 model 91: machine philosophy and instruction-handling , 1967 .

[2] Robert Scheifler,et al. An analysis of inline substitution for a structured programming language , 1977, CACM.

[3] Andrew R. Pleszkun,et al. Implementation of precise interrupts in pipelined processors , 1985, ISCA '98.

[4] Thomas R. Gross,et al. Postpass Code Optimization of Pipeline Constraints , 1983, TOPL.

[5] David W. Wall,et al. Global register allocation at link time , 1986, SIGPLAN '86.

[6] James R. Goodman,et al. On the use of registers vs. cache to minimize memory traffic , 1986, ISCA '86.

[7] M. Donald MacLaren. Inline routines in VAXELN Pascal , 1984, SIGPLAN '84.

[8] David R. Ditzel,et al. Register allocation for free: The C machine stack cache , 1982, ASPLOS I.

[9] William A. Wulf,et al. The Design of an Optimizing Compiler , 1975 .

[10] Carlo H. Séquin,et al. A VLSI RISC , 1982, Computer.

[11] Michael J. Flynn,et al. Very high-speed computing systems , 1966 .

[12] Kai Hwang,et al. Computer architecture and parallel processing , 1984, McGraw-Hill Series in computer organization and architecture.

[13] David R. Ditzel,et al. Branch folding in the CRISP microprocessor: reducing branch delay to zero , 1987, ISCA '87.

[14] Jack J. Dongarra,et al. Unrolling loops in fortran , 1979, Softw. Pract. Exp..

[15] Andrew R. Pleszkun,et al. PIPE: a VLSI decoupled architecture , 1985, ISCA '85.

[16] Honesty Cheng Young. Evaluation of a decoupled computer architecture and the design of a vector extension (pipelined processor; delayed branch, code scheduling, software pipelining, queue register) , 1985 .

[17] Joseph A. Fisher,et al. Trace Scheduling: A Technique for Global Microcode Compaction , 1981, IEEE Transactions on Computers.

[18] Robert B. Murray,et al. Compiling for the CRISP Microprocessor , 1987, COMPCON.

[19] Deborah S. Coutant,et al. Compilers for the New Generation of Hewlett-Packard Computers , 1986, COMPCON.

[20] Wei-Chung Hsu,et al. On the use of registers vs. cache to minimize memory traffic , 1986, ISCA 1986.

[21] George Radin,et al. The 801 minicomputer , 1982, ASPLOS I.

[22] Andrew R. Pleszkun,et al. PIPE: a VLSI decoupled architecture , 1985, ISCA '85.

[23] Edward S. Davidson,et al. Highly concurrent scalar processing , 1986, ISCA 1986.

[24] JOHN L. HENNESSY,et al. VLSI Processor Architecture , 1984, IEEE Transactions on Computers.

[25] Wei-Chung Hsu. Register allocation and code scheduling for load/store architectures , 1987 .

[26] S. McFarling,et al. Reducing the cost of branches , 1986, ISCA '86.

[27] Norman P. Jouppi,et al. Hardware/software tradeoffs for increased performance , 1982, ASPLOS I.