Memory performance analysis of SPEC2000C for the Intel(R) Itanium/sup TM/ processor

We describe our memory performance analysis of SPEC2000C using the newly released Intel(R) Itanium/sup TM/ processor (IPF). Memory overhead is very significant for SPEC200OC; on the average 39% cycles are spent in data stalls. Cache misses are significant, but also data translation performance (DTLB) affects many benchmarks. We present a study based on collecting measurements from the hardware performance counters and cache profiling using program instrumentation of loads/stores. We define important loads as the load sites that contribute at least 95% of the cache misses at all levels. Our measurements show that the number of important loads in a program is relatively small. Our analysis show that important loads are most of the time contained in inner loops, and that the trip counts of these loops is significantly high. We present preliminary results on using stride profiling to reduce cache misses of important loads, bringing an average of 6% improvement to SPEC2000C. Finally, we present our study of data translation performance and propose design choices.

[1]  Anoop Gupta,et al.  Design and evaluation of a compiler algorithm for prefetching , 1992, ASPLOS V.

[2]  Wei-Chung Hsu,et al.  Data Prefetching On The HP PA-8000 , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[3]  Ken Kennedy,et al.  Software prefetching , 1991, ASPLOS IV.

[4]  Rakesh Krishnaiyer,et al.  Optimizing software data prefetches with rotating registers , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.

[5]  Todd C. Mowry,et al.  Compiler-based prefetching for recursive data structures , 1996, ASPLOS VII.

[6]  Chi-Keung Luk,et al.  Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors , 2001, Proceedings 28th Annual International Symposium on Computer Architecture.

[7]  Guang R. Gao,et al.  Speculative Prefetching of Induction Pointers , 2001, CC.

[8]  T. Ozawa,et al.  Cache miss heuristics and preloading techniques for general-purpose programs , 1995, Proceedings of the 28th Annual International Symposium on Microarchitecture.

[9]  Craig Zilles,et al.  Execution-based prediction using speculative slices , 2001, ISCA 2001.

[10]  Rakesh Krishnaiyer,et al.  Value-Profile Guided Stride Prefetching for Irregular Code , 2002, CC.

[11]  Brad Calder,et al.  Value profiling , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[12]  David J. Lilja,et al.  When Caches Aren't Enough: Data Prefetching Techniques , 1997, Computer.