Trace-level reuse

Trace-level reuse is based on the observation that some traces (dynamic sequences of instructions) are frequently repeated during the execution of a program, and in many cases, the instructions that make up such traces have the same source operand values. The execution of such traces will obviously produce the same outcome and thus, their execution can be skipped if the processor records the outcome of previous executions. This paper presents an analysis of the performance potential of trace-level reuse and discusses a preliminary realistic implementation. Like instruction-level reuse, trace-level reuse can improve performance by decreasing resource contention and the latency of some instructions. However, we show that trace-level reuse is more effective than instruction-level reuse because the former can avoid fetching the instructions of reused traces. This has two important benefits: it reduces the fetch bandwidth requirements, and it increases the effective instruction window size since these instructions do not occupy window entries. Moreover, trace-level reuse can compute all at once the result of a chain of dependent instructions, which may allow the processor to avoid the serialization caused by data dependences and thus, to potentially exceed the dataflow limit.

[1]  S. E. Richardson Exploiting trivial and redundant computation , 1993, Proceedings of IEEE 11th Symposium on Computer Arithmetic.

[2]  Samuel P. Harbison An architectural alternative to optimizing compilers , 1982, ASPLOS I.

[3]  David W. Wall,et al.  Limits of instruction-level parallelism , 1991, ASPLOS IV.

[4]  Gurindar S. Sohi,et al.  An empirical analysis of instruction repetition , 1998, ASPLOS VIII.

[5]  M.N. Sastry,et al.  Structure and interpretation of computer programs , 1986, Proceedings of the IEEE.

[6]  Stéphan Jourdan,et al.  A novel renaming scheme to exploit value temporal locality through physical register reuse and unification , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[7]  David Nagle,et al.  Dynamic elimination of pointer-expressions , 1998, Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192).

[8]  Michael J. Flynn,et al.  ON DIVISION AND RECIPROCAL CACHES , 1995 .

[9]  Todd M. Austin,et al.  Dynamic dependency analysis of ordinary programs , 1992, ISCA '92.

[10]  Antonio González,et al.  Dynamic removal of redundant computations , 1999, ICS '99.

[11]  Gurindar S. Sohi,et al.  Understanding the differences between value prediction and instruction reuse , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[12]  S. Richardson Caching Function Results: Faster Arithmetic by Avoiding Unnecessary Computation , 1992 .

[13]  Jian Huang,et al.  Exploiting basic block value locality with block reuse , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[14]  G.S. Sohi,et al.  Dynamic Instruction Reuse , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.