A comparative performance evaluation of various state maintenance mechanisms

Speculative execution and dynamic scheduling are two promising techniques for achieving high performance in superscalar processors. These techniques require a mechanism for maintaining all architecturally visible machine state. The authors examine the performance implications of three common state maintenance mechanisms: the reorder buffer, the history buffer, and checkpointing. They model the execution of the four integer benchmarks from the SPEC89 suite for a variety of maintenance techniques. They report the results of these measurements and their implications with respect to the design of high performance superscalar processors. >

[1]  William M. Johnson,et al.  Super-scalar processor design , 1989 .

[2]  David R. Ditzel,et al.  The hardware architecture of the CRISP microprocessor , 1987, ISCA '87.

[3]  Yale N. Patt,et al.  Alternative implementations of two-level adaptive branch prediction , 1992, ISCA '92.

[4]  David R. Ditzel,et al.  Branch folding in the CRISP microprocessor: reducing branch delay to zero , 1987, ISCA '87.

[5]  Michael Shebanow,et al.  Single instruction stream parallelism is greater than two , 1991, ISCA '91.

[6]  José María Llabería,et al.  Reducing Branch Delay to Zero in Pipelined Processors , 1993, IEEE Trans. Computers.

[7]  Alan Jay Smith,et al.  Branch Prediction Strategies and Branch Target Buffer Design , 1995, Computer.

[8]  Chris H. Perleberg,et al.  Branch Target Buffer Design and Optimization , 1993, IEEE Trans. Computers.

[9]  Yale N. Patt,et al.  Checkpoint repair for out-of-order execution machines , 1987, ISCA '87.

[10]  R. M. Tomasulo,et al.  An efficient algorithm for exploiting multiple arithmetic units , 1995 .

[11]  José María Llabería,et al.  Instruction fetch unit for parallel execution of branch instructions , 1989, ICS '89.

[12]  Jordi Cortadella,et al.  A mechanism for reducing the cost of branches in RISC architectures , 1988, Microprocess. Microprogramming.

[13]  Andrew R. Pleszkun,et al.  Implementation of precise interrupts in pipelined processors , 1985, ISCA '98.

[14]  Yale N. Patt,et al.  Critical issues regarding HPS, a high performance microarchitecture , 1985, MICRO 18.

[15]  Thomas R. Gross,et al.  Optimizing delayed branches , 1982, MICRO 15.

[16]  Yale N. Patt,et al.  A two-level approach to making class predictions , 2003, 36th Annual Hawaii International Conference on System Sciences, 2003. Proceedings of the.

[17]  Richard R. Oehler,et al.  IBM RISC System/6000 Processor Architecture , 1990, IBM J. Res. Dev..

[18]  S. McFarling,et al.  Reducing the cost of branches , 1986, ISCA '86.

[19]  David H. Bailey,et al.  NAS parallel benchmark results , 1992, Proceedings Supercomputing '92.

[20]  Yale N. Patt,et al.  An Area-Efficient Register Alias Table for Implementing HPS , 1990, ICPP.

[21]  Yale N. Patt,et al.  HPS, a new microarchitecture: rationale and introduction , 1985, MICRO 18.

[22]  Paul Strauss,et al.  Motorola Inc. , 1993 .

[23]  Henry M. Levy,et al.  An evaluation of branch architectures , 1987, ISCA '87.