Phase guided profiling for fast cache modeling

Statistical cache models are powerful tools for understanding application behavior as a function of cache allocation. However, previous techniques have modeled only the average application behavior, which hides the effect of program variations over time. Without detailed time-based information, transient behavior, such as exceeding bandwidth or cache capacity, may be missed. Yet these events, while short, often play a disproportionate role and are critical to understanding program behavior. In this work we extend earlier techniques to incorporate program phase information when collecting runtime profiling data. This allows us to model an application's cache miss ratio as a function of its cache allocation over time. To reduce overhead and improve accuracy we use online phase detection and phase-guided profiling. The phase-guided profiling reduces overhead by more intelligently selecting portions of the application to sample, while accuracy is improved by combining samples from different instances of the same phase. The result is a new technique that accurately models the time-varying behavior of an application's miss ratio as a function of its cache allocation on modern hardware. By leveraging phase-guided profiling, this work both improves on the accuracy of previous techniques and reduces the overhead.

[1]  David Eklov,et al.  Efficient software-based online phase classification , 2011, 2011 IEEE International Symposium on Workload Characterization (IISWC).

[2]  Brad Calder,et al.  Structures for phase classification , 2004, IEEE International Symposium on - ISPASS Performance Analysis of Systems and Software, 2004.

[3]  Brad Calder,et al.  Phase tracking and prediction , 2003, ISCA '03.

[4]  Brad Calder,et al.  Transition phase classification and prediction , 2005, 11th International Symposium on High-Performance Computer Architecture.

[5]  Brad Calder,et al.  Detecting phases in parallel applications on shared memory architectures , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[6]  Brad Calder,et al.  Basic block distribution analysis to find periodic behavior and simulation points in applications , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.

[7]  Bilha Mendelson,et al.  Detecting Change in Program Behavior for Adaptive Optimization , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).

[8]  Chen Ding,et al.  Locality phase prediction , 2004, ASPLOS XI.

[9]  Irving L. Traiger,et al.  Evaluation Techniques for Storage Hierarchies , 1970, IBM Syst. J..

[10]  John L. Henning SPEC CPU2006 benchmark descriptions , 2006, CARN.

[11]  James E. Smith,et al.  Managing multi-configuration hardware via dynamic working set analysis , 2002, ISCA.

[12]  Margaret Martonosi,et al.  Live, Runtime Phase Monitoring and Prediction on Real Systems with Application to Dynamic Power Management , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[13]  Erik Hagersten,et al.  A statistical multiprocessor cache model , 2006, 2006 IEEE International Symposium on Performance Analysis of Systems and Software.

[14]  Chandra Krintz,et al.  Online phase detection algorithms , 2006, International Symposium on Code Generation and Optimization (CGO'06).

[15]  Hridesh Rajan,et al.  Phase-based tuning for better utilization of performance-asymmetric multicore processors , 2011, International Symposium on Code Generation and Optimization (CGO 2011).

[16]  Chandra Krintz,et al.  Phase-aware remote profiling , 2005, International Symposium on Code Generation and Optimization.

[17]  Erik Hagersten,et al.  Fast data-locality profiling of native execution , 2005, SIGMETRICS '05.

[18]  Brad Calder,et al.  The Strong correlation Between Code Signatures and Performance , 2005, IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005..

[19]  Yan Solihin,et al.  Predicting inter-thread cache contention on a chip multi-processor architecture , 2005, 11th International Symposium on High-Performance Computer Architecture.

[20]  David Eklov,et al.  StatStack: Efficient modeling of LRU caches , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).

[21]  Erik Hagersten,et al.  StatCache: a probabilistic approach to efficient and accurate data locality analysis , 2004, IEEE International Symposium on - ISPASS Performance Analysis of Systems and Software, 2004.

[22]  David A. Wood,et al.  Cache profiling and the SPEC benchmarks: a case study , 1994, Computer.

[23]  David Eklov,et al.  Fast modeling of shared caches in multicore systems , 2011, HiPEAC.

[24]  Michael Stumm,et al.  RapidMRC: approximating L2 miss rate curves on commodity systems for online optimizations , 2009, ASPLOS.

[25]  Ryan N. Rakvic,et al.  The Fuzzy Correlation between Code and Performance Predictability , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[26]  Margo I. Seltzer,et al.  Performance of Multithreaded Chip Multiprocessors and Implications for Operating System Design , 2005, USENIX Annual Technical Conference, General Track.

[27]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[28]  James E. Smith,et al.  Comparing Program Phase Detection Techniques , 2003, MICRO.

[29]  Kathryn S. McKinley,et al.  Decomposing memory performance: data structures and phases , 2006, ISMM '06.

[30]  Brad Calder,et al.  Motivation for Variable Length Intervals and Hierarchical Phase Behavior , 2005, IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005..

[31]  Brad Calder,et al.  Automatically characterizing large scale program behavior , 2002, ASPLOS X.