Shadow Profiling: Hiding Instrumentation Costs with Parallelism

In profiling, a tradeoff exists between information and overhead. For example, hardware-sampling profilers incur negligible overhead, but the information they collect is consequently very coarse. Other profilers use instrumentation tools to gather temporal traces such as path profiles and hot memory streams, but they have high overhead. Runtime and feedback-directed compilation systems need detailed information to aggressively optimize, but the cost of gathering profiles can outweigh the benefits. Shadow profiling is a novel method for sampling long traces of instrumented code in parallel with normal execution, taking advantage of the trend of increasing numbers of cores. Each instrumented sample can be many millions of instructions in length. The primary goal is to incur negligible overhead, yet attain profile information that is nearly as accurate as a perfect profile. The profiler requires no modifications to the operating system or hardware, and is tunable to allow for greater coverage or lower overhead. We evaluate the performance and accuracy of this new profiling technique for two common types of instrumentation-based profiles: interprocedural path profiling and value profiling. Overall, profiles collected using the shadow profiling framework are 94% accurate versus perfect value profiles, while incurring less than 1% overhead. Consequently, this technique increases the viability of dynamic and continuous optimization systems by hiding the high overhead of instrumentation and enabling the online collection of many types of profiles that were previously too costly

[1]  Brad Calder,et al.  Value Profiling and Optimization , 1999, J. Instr. Level Parallelism.

[2]  E. Duesterwald,et al.  Software profiling for hot path prediction: less is more , 2000, SIGP.

[3]  James R. Larus,et al.  Efficient path profiling , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[4]  Burzin A. Patel,et al.  Using branch handling hardware to support profile-driven optimization , 1994, Proceedings of MICRO-27. The 27th Annual IEEE/ACM International Symposium on Microarchitecture.

[5]  Kim M. Hazelwood,et al.  SuperPin: Parallelizing Dynamic Instrumentation for Real-Time Performance , 2007, International Symposium on Code Generation and Optimization (CGO'07).

[6]  Wei-Chung Hsu,et al.  Design and Implementation of a Lightweight Dynamic Optimization System , 2004, J. Instr. Level Parallelism.

[7]  Michael D. Smith,et al.  Ephemeral Instrumentation for Lightweight Program Profiling , 1997 .

[8]  Martin Hirzel,et al.  Bursty Tracing: A Framework for Low-Overhead Temporal Profiling , 2001 .

[9]  Brinkley Sprunt,et al.  Pentium 4 Performance-Monitoring Features , 2002, IEEE Micro.

[10]  Louise E. Moser,et al.  Deterministic scheduling for multithreaded replicas , 2005, 10th IEEE International Workshop on Object-Oriented Real-Time Dependable Systems.

[11]  Wei-Chung Hsu,et al.  Dynamic trace selection using performance monitoring hardware sampling , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..

[12]  Gurindar S. Sohi,et al.  A programmable co-processor for profiling , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[13]  Y. N. Srikant,et al.  A programmable hardware path profiler , 2005, International Symposium on Code Generation and Optimization.

[14]  Michael D. Smith,et al.  Persistent Code Caching: Exploiting Code Reuse Across Executions and Applications , 2007, International Symposium on Code Generation and Optimization (CGO'07).

[15]  Martin Hirzel,et al.  Dynamic hot data stream prefetching for general-purpose programs , 2002, PLDI '02.

[16]  Lance M. Berc,et al.  Continuous profiling: where have all the cycles gone? , 1997, ACM Trans. Comput. Syst..

[17]  James E. Smith,et al.  Relational profiling: enabling thread-level parallelism in virtual machines , 2000, MICRO 33.

[18]  Tipp Moseley,et al.  Analysis of path profiling information generated with performance monitoring hardware , 2005, 9th Annual Workshop on Interaction between Compilers and Computer Architectures (INTERACT'05).

[19]  Trishul M. Chilimbi Efficient representations and abstractions for quantifying and exploiting data reference locality , 2001, PLDI '01.

[20]  John C. Gyllenhaal,et al.  A hardware-driven profiling scheme for identifying program hot spots to support runtime optimization , 1999, ISCA.

[21]  David W. Wall,et al.  Predicting program behavior using real or estimated profiles , 2004, SIGP.

[22]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[23]  Matthew Arnold,et al.  A framework for reducing the cost of instrumented code , 2001, PLDI '01.

[24]  Robert S. Cohn,et al.  Optimizing Alpha Executables on Windows NT with Spike , 1998, Digit. Tech. J..

[25]  Galen C. Hunt,et al.  Detours: binary interception of Win32 functions , 1999 .

[26]  Nicholas Nethercote,et al.  Valgrind: A Program Supervision Framework , 2003, RV@CAV.

[27]  Ravishankar K. Iyer,et al.  A preemptive deterministic scheduling algorithm for multithreaded replicas , 2003, 2003 International Conference on Dependable Systems and Networks, 2003. Proceedings..

[28]  Thomas F. Wenisch,et al.  SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling , 2003, ISCA '03.

[29]  Derek Bruening,et al.  An infrastructure for adaptive dynamic optimization , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..

[30]  Jeffrey K. Hollingsworth,et al.  An API for Runtime Code Patching , 2000, Int. J. High Perform. Comput. Appl..

[31]  John Whaley,et al.  A portable sampling-based profiler for Java virtual machines , 2000, JAVA '00.

[32]  Jeffrey Dean,et al.  ProfileMe: hardware support for instruction-level profiling on out-of-order processors , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[33]  J. Larus Whole program paths , 1999, PLDI '99.

[34]  James E. Smith,et al.  Rapid profiling via stratified sampling , 2001, ISCA 2001.

[35]  Kerstin Eder,et al.  International Symposium on Code Generation and Optimization. CGO 2003 , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..