Static identification of delinquent loads

The effective use of processor caches is crucial to the performance of applications. It has been shown that cache misses are not evenly distributed throughout a program. In applications running on RISC-style processors, a small number of delinquent load instructions are responsible for most of the cache misses. Identification of delinquent loads is the key to the success of many cache optimization and prefetching techniques. We propose a method for identifying delinquent loads that can be implemented at compile time. Our experiments over eighteen benchmarks from the SPEC suite shows that our proposed scheme is stable across benchmarks, inputs, and cache structures, identifying an average of 10% of the total number of loads in the benchmarks we tested that account for over 90% of all data cache misses. As far as we know, this is the first time a technique for static delinquent load identification with such a level of precision and coverage has been reported. While comparable techniques can also identify load instructions that cover 90% of all data cache misses, they do so by selecting over 50% of all load instructions in the code, resulting in a high number of false positives. If basic block profiling is used in conjunction with our heuristic, then our results show that it is possible to pin down just 1.3% of the load instructions that account for 82% of all data cache misses.

[1]  Teresa L. Johnson Automatic Annotation Of Instructions With Profiling Information , 1995 .

[2]  Luddy Harrison Examination of a memory access classification scheme for pointer-intensive and numeric programs , 1996, ICS '96.

[3]  T. Ozawa,et al.  Cache miss heuristics and preloading techniques for general-purpose programs , 1995, Proceedings of the 28th Annual International Symposium on Microarchitecture.

[4]  Sharad Malik,et al.  Cache miss equations: an analytical representation of cache misses , 1997, ICS '97.

[5]  James E. Smith,et al.  Rapid profiling via stratified sampling , 2001, ISCA 2001.

[6]  Reinhard Wilhelm,et al.  Cache Behavior Prediction by Abstract Interpretation , 1996, Sci. Comput. Program..

[7]  Manoj Franklin,et al.  Control flow prediction with tree-like subgraphs for superscalar processors , 1995, MICRO 1995.

[8]  John Paul Shen,et al.  Speculative precomputation: long-range prefetching of delinquent loads , 2001, Proceedings 28th Annual International Symposium on Computer Architecture.

[9]  David J. Lilja,et al.  Data prefetch mechanisms , 2000, CSUR.

[10]  Matthias Hauswirth,et al.  Static load classification for improving the value predictability of data-cache misses , 2002, PLDI '02.

[11]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[12]  S. Abraham,et al.  Predicating Load Latencies Using Cache Profiling , 1996 .

[13]  James R. Larus,et al.  Static branch frequency and program profile analysis , 1994, MICRO 27.

[14]  W. F. Wong,et al.  Source Level Static Branch Prediction , 1999, Comput. J..

[15]  Mikko H. Lipasti,et al.  Cache miss heuristics and preloading techniques for general-purpose programs , 1995, MICRO 28.