This paper presents a study of the performance limits of data value reuse. Two types of data value reuse are considered: instruction-level reuse andtrace-level reuse . The former reuses instances of single instructions whereas the latter reuses sequences of instructions as an atomic unit. Two different scenarios are considered: an infinite resource machine and a machine with a limited instruction window. The results show that reuse is abundant in the SPEC applications. Instructionlevel reuse may provide a significant speedup but it drops dramatically when the reuse latency is considered. Trace-level reuse has in general less potential for the unlimited window scenario but it is much more effective for the limited window configuration. This is because trace-level reuse, in addition to reduce the execution latency, increases the effective instruction window size, by avoiding the fetch and execution of sequences of instructions. Overall, trace-level reuse is shown to be a promising approach since it can provide speedups around 3 for a 256-entry instruction window and a realistic reuse latency.
[1]
S. E. Richardson.
Exploiting trivial and redundant computation
,
1993,
Proceedings of IEEE 11th Symposium on Computer Arithmetic.
[2]
G.S. Sohi,et al.
Dynamic instruction reuse
,
1997,
ISCA '97.
[3]
Todd M. Austin,et al.
Dynamic dependency analysis of ordinary programs
,
1992,
ISCA '92.
[4]
Michael J. Flynn,et al.
ON DIVISION AND RECIPROCAL CACHES
,
1995
.
[5]
Samuel P. Harbison.
An architectural alternative to optimizing compilers
,
1982,
ASPLOS I.
[6]
S. Richardson.
Caching Function Results: Faster Arithmetic by Avoiding Unnecessary Computation
,
1992
.
[7]
David W. Wall,et al.
Limits of instruction-level parallelism
,
1991,
ASPLOS IV.
[8]
Gerald J. Sussman,et al.
Structure and interpretation of computer programs
,
1985,
Proceedings of the IEEE.