Exploiting Large Ineffectual Instruction Sequences

A processor executes the full dynamic instruction stream in order to compute the final output of a program, yet we observe equivalent, smaller instruction streams that produce the same correct output. Based on this observation, we attempt to identify large, dynamically-contiguous regions of instructions that are ineffectual as a whole: they either contain no writes, writes that are never referenced, or writes that do not modify the value of a location. The architectural implication is that instruction fetch/execution can quickly bypass predicted-ineffectual regions, while another thread of control verifies that the implied branch predictions in the region are correct and that the region is truly ineffectual.

[1]  M. F.,et al.  Bibliography , 1985, Experimental Gerontology.

[2]  Jian Huang,et al.  Exploiting basic block value locality with block reuse , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[3]  James E. Smith,et al.  Path-based next trace prediction , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[4]  Gurindar S. Sohi,et al.  An empirical analysis of instruction repetition , 1998, ASPLOS VIII.

[5]  James E. Smith,et al.  Modeling program predictability , 1998, ISCA.

[6]  Antonio González,et al.  Trace-level reuse , 1999, Proceedings of the 1999 International Conference on Parallel Processing.

[7]  G.S. Sohi,et al.  Dynamic instruction reuse , 1997, ISCA '97.

[8]  Yale N. Patt,et al.  Target prediction for indirect jumps , 1997, ISCA '97.

[9]  Jack L. Lo,et al.  Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[10]  Eric Rotenberg,et al.  A Trace Cache Microarchitecture and Evaluation , 1999, IEEE Trans. Computers.

[11]  Daniel H. Friendly,et al.  Evaluation of Design Options for the Trace Cache Fetch Mechanism , 1999, IEEE Trans. Computers.

[12]  Doug Burger,et al.  Evaluating Future Microprocessors: the SimpleScalar Tool Set , 1996 .

[13]  Mikko H. Lipasti,et al.  Value locality and load value prediction , 1996, ASPLOS VII.

[14]  Eric Rotenberg,et al.  AR-SMT: a microarchitectural approach to fault tolerance in microprocessors , 1999, Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352).

[15]  Kunle Olukotun,et al.  The case for a single-chip multiprocessor , 1996, ASPLOS VII.

[16]  S. McFarling Combining Branch Predictors , 1993 .

[17]  Mikko H. Lipasti Value locality and speculative execution , 1998 .

[18]  D.R. Kaeli,et al.  Branch history table prediction of moving target branches due to subroutine returns , 1991, [1991] Proceedings. The 18th Annual International Symposium on Computer Architecture.