A Run-Time Program Phase Detection Technique for Optimizing Per-Phase L2 Cache Demand

Article history: Received: 31 March 2016 Received in revised form: 25 May 2016 Accepted: 1 June 2016 Available online: 13 July 2016 Understanding program behavior is at the foundation of computer architecture and program optimization. Programs pass through different behaviors where their performance characteristics and hardware resource requirements vary. Program phase detection and classification research aiming to understand the program timevarying behavior, can unlock a lot of phase-based optimizations which are specially tailored to improve the performance of each individual program phase. In this paper, we introduce an efficient run-time phase detection and classification technique, based on tracking changes in the L2 cache access pattern of different portions in the program execution. The proposed technique monitors a running program and keeps track what phase the running program is currently executing, with no need to recompile the tracked program, and with execution time overhead of 4%, on average. Performance Monitoring Unit (PMU) is exploited to sample the memory addresses causing L1 data cache misses. This profiling data is used to construct the Cache Access Signature Vectors (CASVs) that accurately reflect the L2 cache access patterns for each interval of execution. By comparing CASVs, the proposed technique classifies the program into a set of stable phases with high degree of intraphase homogeneity. Our evaluation shows that phase changes detected by our technique have strong correlation with the variation in Instruction Per Cycle (IPC). Furthermore, our technique can contribute in reducing L2 cache miss rates, and optimizing L2 cache utilization, through its direct capability of estimating per-phase L2 cache demand.

[1]  Peter J. Denning,et al.  The working set model for program behavior , 1968, CACM.

[2]  Brad Calder,et al.  Time Varying Behavior of Programs , 1999 .

[3]  Brad Calder,et al.  Basic block distribution analysis to find periodic behavior and simulation points in applications , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.

[4]  R. D. Barnes,et al.  An Architectural Framework for Run-Time Optimization , 2001 .

[5]  Wen-mei W. Hwu,et al.  Vacuum packing: extracting hardware-detected program phases for post-link optimization , 2002, MICRO.

[6]  Brad Calder,et al.  Automatically characterizing large scale program behavior , 2002, ASPLOS X.

[7]  James E. Smith,et al.  Managing multi-configuration hardware via dynamic working set analysis , 2002, ISCA.

[8]  A.S. Dhodapkar,et al.  Dynamic microarchitecture adaptation via co-designed virtual machines , 2002, 2002 IEEE International Solid-State Circuits Conference. Digest of Technical Papers (Cat. No.02CH37315).

[9]  Norman P. Jouppi,et al.  Processor Power Reduction Via Single-ISA Heterogeneous Multi-Core Architectures , 2003, IEEE Computer Architecture Letters.

[10]  Brad Calder,et al.  Discovering and Exploiting Program Phases , 2003, IEEE Micro.

[11]  Brad Calder,et al.  Phase tracking and prediction , 2003, ISCA '03.

[12]  Comparing Program Phase Detection Techniques , 2003, MICRO.

[13]  Chen Ding,et al.  Locality phase prediction , 2004, ASPLOS XI.

[14]  Brad Calder,et al.  Transition phase classification and prediction , 2005, 11th International Symposium on High-Performance Computer Architecture.

[15]  Hajime Shimada,et al.  Program Phase Detection Based Dynamic Control Mechanisms for Pipeline Stage Unification Adoption , 2005, ISHPC.

[16]  David K. Tam,et al.  Operating system management of shared caches on multicore processors , 2010 .

[17]  Andrzej Nowak,et al.  The overhead of profiling using PMU hardware counters , 2014 .