Instruction recycling on a multiple-path processor

Processors that can simultaneously execute multiple paths of execution will only exacerbate the fetch bandwidth problem already plaguing conventional processors. On a multiple-path processor which speculatively executes less likely paths of hard-to-predict branches, the work done along a speculative path is normally discarded if that path is found to be incorrect. Instead, it can be beneficial to keep these instruction traces stored in the processor for possible future use. This paper introduces instruction recycling, where previously decoded instructions from recently executed paths are injected back into the rename stage. This increases the supply of instructions to the execution pipeline and decreases fetch latency. In addition, if the operands have not changed for a recycled instruction, the instruction can bypass the issue and execution stages, benefiting from instruction reuse. Instruction recycling and reuse are examined for a simultaneous multithreading architecture with multiple path execution. It is shown to increase performance by 7% for single-program workloads and by 12% on multiple-program workloads.

[1]  Gurindar S. Sohi,et al.  ARB: A Hardware Mechanism for Dynamic Reordering of Memory References , 1996, IEEE Trans. Computers.

[2]  S. McFarling Combining Branch Predictors , 1993 .

[3]  Eric Rotenberg,et al.  Trace cache: a low latency approach to high bandwidth instruction fetching , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[4]  Dean M. Tullsen,et al.  Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[5]  Yale N. Patt,et al.  Alternative Implementations of Two-Level Adaptive Branch Prediction , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[6]  E. Smith,et al.  Selective Dual Path Execution , 1996 .

[7]  Kenneth C. Yeager The Mips R10000 superscalar microprocessor , 1996, IEEE Micro.

[8]  Alexandre E. Eichenberger,et al.  Stage scheduling: a technique to reduce the register requirements of a module schedule , 1995, MICRO 1995.

[9]  Antonio González,et al.  Speculative multithreaded processors , 1998, ICS '98.

[10]  Dirk Grunwald,et al.  Fast and accurate instruction fetch and branch prediction , 1994, ISCA '94.

[11]  Margaret Martonosi,et al.  Multipath execution: opportunities and limits , 1998, ICS '98.

[12]  Dirk Grunwald,et al.  Selective eager execution on the PolyPath architecture , 1998, ISCA.

[13]  Gary S. Tyson,et al.  Limited Dual Path Execution , 2000 .

[14]  G.S. Sohi,et al.  Dynamic instruction reuse , 1997, ISCA '97.

[15]  A MahlkeScott,et al.  Dynamic memory disambiguation using the memory conflict buffer , 1994 .

[16]  Augustus K. Uht,et al.  Disjoint eager execution: an optimal form of speculative execution , 1995, Proceedings of the 28th Annual International Symposium on Microarchitecture.

[17]  Scott A. Mahlke,et al.  Dynamic memory disambiguation using the memory conflict buffer , 1994, ASPLOS VI.

[18]  Eric Rotenberg,et al.  Assigning confidence to conditional branch predictions , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[19]  Dean M. Tullsen,et al.  Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).