The Spatial Character istics of Load Instructions

Relatively little background work has been done to examine the miss behavior of all static and dynamic load instructions, especially in the context of the entire program. This study addresses this gap in knowledge by presenting the whole-program (as opposed to sampling) profiling results for load behavior. Specifically, this study confirms the conclusion of previous work in that a very small percentage of static loads (i.e. PCs) account for a disproportionately large percentage of the total L1 cache misses. This study also shows that an equally small percentage of unique effective addresses (EAs) account for a comparable percentage of the total L1 misses. In other words, a few "hot" PCs and EAs are responsible for a large percentage of the total L1 misses. Furthermore, a surprisingly large percentage of the dynamic loads can bypass the memory hierarchy by using store forwarding. However, the main contribution of this work is to quantify the whole-program behavior of static and dynamic loads with several different metrics that either confirm or contradict conventional thought or previous work.

[1]  Dionisios N. Pnevmatikatos,et al.  Cache performance of the SPEC92 benchmark suite , 1993, IEEE Micro.

[2]  Chia-Lin Yang,et al.  Push vs. pull: data movement for linked data structures , 2000, ICS '00.

[3]  M. Merten,et al.  A hardware-driven profiling scheme for identifying program hot spots to support runtime optimization , 1999, Proceedings of the 26th International Symposium on Computer Architecture (Cat. No.99CB36367).

[4]  Todd C. Mowry,et al.  Predicting data cache misses in non-numeric applications through correlation profiling , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[5]  David A. Wood,et al.  Cache profiling and the SPEC benchmarks: a case study , 1994, Computer.

[6]  Lance M. Berc,et al.  Continuous profiling: where have all the cycles gone? , 1997, TOCS.

[7]  Chi-Keung Luk,et al.  Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors , 2001, Proceedings 28th Annual International Symposium on Computer Architecture.

[8]  Alexander V. Veidenbaum,et al.  Stride-directed prefetching for secondary caches , 1997, Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162).

[9]  Trevor N. Mudge,et al.  Instruction prefetching using branch prediction information , 1997, Proceedings International Conference on Computer Design VLSI in Computers and Processors.

[10]  Craig Zilles,et al.  Execution-based prediction using speculative slices , 2001, ISCA 2001.

[11]  Predicating Load Latencies Using Cache Profiling , 1996 .

[12]  John Flynn,et al.  Adapting the SPEC 2000 benchmark suite for simulation-based computer architecture research , 2001 .

[13]  Jignesh M. Patel,et al.  Data prefetching by dependence graph precomputation , 2001, Proceedings 28th Annual International Symposium on Computer Architecture.

[14]  Mark J. Charney,et al.  Prefetching and memory system behavior of the SPEC95 benchmark suite , 1997, IBM J. Res. Dev..

[15]  Jignesh M. Patel,et al.  Call graph prefetching for database applications , 2003, TOCS.

[16]  James R. Goodman,et al.  Limited bandwidth to affect processor design , 1997, IEEE Micro.

[17]  Mark D. Hill,et al.  Cache performance for selected SPEC CPU2000 benchmarks , 2001, CARN.

[18]  J. Kelly Flanagan,et al.  Using the BACH trace collection mechanism to characterize the SPEC 2000 integer benchmarks , 2001 .

[19]  John Paul Shen,et al.  Speculative precomputation: long-range prefetching of delinquent loads , 2001, Proceedings 28th Annual International Symposium on Computer Architecture.

[20]  Rajeev Balasubramonian,et al.  Dynamically allocating processor resources between nearby and distant ILP , 2001, Proceedings 28th Annual International Symposium on Computer Architecture.

[21]  Trevor N. Mudge,et al.  Wrong-path instruction prefetching , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[22]  Andreas Moshovos,et al.  Dependence based prefetching for linked data structures , 1998, ASPLOS VIII.

[23]  C. Zilles,et al.  Understanding the backward slices of performance degrading instructions , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[24]  Dean M. Tullsen,et al.  Runtime identification of cache conflict misses: The adaptive miss buffer , 2001, TOCS.