Compiler-directed dynamic computation reuse: rationale and initial results

Recent studies on value locality reveal that many instructions are frequently executed with a small variety of inputs. This paper proposes an approach that integrates architecture and compiler techniques to exploit value locality for large regions of code. The approach strives to eliminate redundant processor execution created by both instruction-level input repetition and recurrence of input data within high-level computations. In this approach, the compiler performs analysis to identify code regions whose computation can be reused during dynamic execution. The instruction set architecture provides a simple interface for the compiler to communicate the scope of each reuse region and its live-out register information to the hardware. During run time, the execution results of these reusable computation regions are recorded into hardware buffers for potential reuse. Each reuse can eliminate the execution of a large number of dynamic instructions. Furthermore, the actions needed to update the live-out registers can be performed at a higher degree of parallelism than the original code, breaking intrinsic dataflow dependence constraints. Initial results show that the compiler analysis can indeed identify large reuse regions. Overall, the approach can improve the performance of a 6-issue microarchitecture by an average of 30% for a collection of SPEC and integer benchmarks.

[1]  Mikko H. Lipasti,et al.  Exceeding the dataflow limit via value prediction , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[2]  Avi Mendelson,et al.  Can program profiling support value prediction? , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[3]  Thomas M. Conte,et al.  Value speculation scheduling for high performance processors , 1998, ASPLOS VIII.

[4]  James E. Smith,et al.  Modeling program predictability , 1998, ISCA.

[5]  Laurie J. Hendren,et al.  Context-sensitive interprocedural points-to analysis in the presence of function pointers , 1994, PLDI '94.

[6]  S. Richardson Caching Function Results: Faster Arithmetic by Avoiding Unnecessary Computation , 1992 .

[7]  Mikko H. Lipasti,et al.  Value locality and load value prediction , 1996, ASPLOS VII.

[8]  GhiyaRakesh,et al.  Context-sensitive interprocedural points-to analysis in the presence of function pointers , 1994 .

[9]  Jian Huang,et al.  Exploiting basic block value locality with block reuse , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[10]  Scott A. Mahlke,et al.  Dynamic memory disambiguation using the memory conflict buffer , 1994, ASPLOS VI.

[11]  Gurindar S. Sohi,et al.  An empirical analysis of instruction repetition , 1998, ASPLOS VIII.

[12]  G.S. Sohi,et al.  Dynamic instruction reuse , 1997, ISCA '97.

[13]  Gurindar S. Sohi,et al.  Understanding the differences between value prediction and instruction reuse , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[14]  Mary Lou Soffa,et al.  Complete Removal of Redundant Computations , 1998, ACM-SIGPLAN Symposium on Programming Language Design and Implementation.

[15]  Brad Calder,et al.  Value profiling , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[16]  James E. Smith,et al.  The predictability of data values , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[17]  Antonio González,et al.  Trace-level reuse , 1999, Proceedings of the 1999 International Conference on Parallel Processing.

[18]  Rajiv Gupta,et al.  Complete removal of redundant expressions , 1998, PLDI 1998.

[19]  Samuel P. Harbison An architectural alternative to optimizing compilers , 1982, ASPLOS I.