StatCache: a probabilistic approach to efficient and accurate data locality analysis

The widening memory gap reduces performance of applications with poor data locality. Therefore, there is a need for methods to analyze data locality and help application optimization. In this paper we present StatCache, a novel sampling-based method for performing data-locality analysis on realistic workloads. StatCache is based on a probabilistic model of the cache, rather than a functional cache simulator. It uses statistics from a single run to accurately estimate miss ratios of fully-associative caches of arbitrary sizes and generate working-set graphs. We evaluate StatCache using the SPEC CPU2000 benchmarks and show that StatCache gives accurate results with a sampling rate as low as 10/sup -4/. We also provide a proof-of-concept implementation, and discuss potentially very fast implementation alternatives.

[1]  Scott Devine,et al.  Using the SimOS machine simulator to study complex computer systems , 1997, TOMC.

[2]  Sharad Malik,et al.  Cache miss equations: a compiler framework for analyzing and tuning memory behavior , 1999, TOPL.

[3]  Margaret Martonosi,et al.  MemSpy: analyzing memory system bottlenecks in programs , 1992, SIGMETRICS '92/PERFORMANCE '92.

[4]  David A. Wood,et al.  Cache profiling and the SPEC benchmarks: a case study , 1994, Computer.

[5]  Robert Zak,et al.  SMP System Interconnect Instrumentation for Performance Analysis , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[6]  Erik Hagersten Simple and Efficient Instrumentation for the DSZOOM System , 2003 .

[7]  Irving L. Traiger,et al.  Evaluation Techniques for Storage Hierarchies , 1970, IBM Syst. J..

[8]  Jeffrey K. Hollingsworth,et al.  SIGMA: A Simulator Infrastructure to Guide Memory Analysis , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[9]  David Black-Schaffer StatCache: A Probabilistic Approach to Efficient and Accurate Data Locality Analysis , 2004 .

[10]  Håkan Grahn,et al.  SimICS/Sun4m: A Virtual Workstation , 1998, USENIX Annual Technical Conference.

[11]  Amitabh Srivastava,et al.  Analysis Tools , 2019, Public Transportation Systems.

[12]  James R. Larus,et al.  EEL: machine-independent executable editing , 1995, PLDI '95.

[13]  David A. Wood,et al.  A Comparison of Trace-Sampling Techniques for Multi-Megabyte Caches , 1994, IEEE Trans. Computers.

[14]  David A. Wood,et al.  A model for estimating trace-sample miss ratios , 1991, SIGMETRICS '91.

[15]  Thomas M. Conte,et al.  Combining Trace Sampling with Single Pass Methods for Efficient Cache Simulation , 1998, IEEE Trans. Computers.

[16]  Lance M. Berc,et al.  Continuous profiling: where have all the cycles gone? , 1997, ACM Trans. Comput. Syst..

[17]  Jack J. Dongarra,et al.  A Scalable Cross-Platform Infrastructure for Application Performance Tuning Using Hardware Counters , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[18]  Margaret Martonosi,et al.  Tuning Memory Performance of Sequential and Parallel Programs , 1995, Computer.

[19]  Jeffrey K. Hollingsworth,et al.  Using Hardware Performance Monitors to Isolate Memory Bottlenecks , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[20]  Koen De Bosschere,et al.  DIOTA: Dynamic Instrumentation, Optimization and Transformation of Applications , 2002, PACT 2002.

[21]  John Flynn,et al.  Adapting the SPEC 2000 benchmark suite for simulation-based computer architecture research , 2001 .

[22]  David B. Whalley,et al.  Tools for application-oriented performance tuning , 2001, ICS '01.