Practical path profiling for dynamic optimizers

Modern processors are hungry for instructions. To satisfy them, compilers need to find and optimize execution paths across multiple basic blocks. Path profiles provide this context, but their high overhead has so far limited their use by dynamic compilers. We present new techniques for low overhead online practical path profiling (PPP). Following targeted path profiling (TPP), PPP uses an edge profile to simplify path profile instrumentation (profile-guided profiling). PPP improves over prior work by (1) reducing the amount of profiling instrumentation on cold paths and paths that the edge profile predicts well and (2) reducing the cost of the remaining instrumentation. Experiments in an ahead-of-time compiler perform edge profile-guided inlining and unrolling prior to path profiling instrumentation. These transformations are faithful to staged optimization, and create longer, harder to predict paths. We introduce the branch-flow metric to measure path flow as a function of branch decisions, rather than weighting all paths equally as in prior work. On SPEC2000, PPP maintains high accuracy and coverage, but has only 5% overhead on average (ranging from -3% to 13%), making it appealing for use by dynamic compilers.

[1]  Michael D. Smith,et al.  Better global scheduling using path profiles , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[2]  Rajiv Gupta,et al.  Path profile guided partial redundancy elimination using speculation , 1998, Proceedings of the 1998 International Conference on Computer Languages (Cat. No.98CB36225).

[3]  Michael D. Smith,et al.  Path-based compilation , 1998 .

[4]  James R. Larus,et al.  Improving data-flow analysis with path profiles , 1998, PLDI.

[5]  Brinkley Sprunt,et al.  Pentium 4 Performance-Monitoring Features , 2002, IEEE Micro.

[6]  Y. N. Srikant,et al.  A programmable hardware path profiler , 2005, International Symposium on Code Generation and Optimization.

[7]  Jeffrey Dean,et al.  ProfileMe: hardware support for instruction-level profiling on out-of-order processors , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[8]  Toshiaki Yasue,et al.  Structural Path Profiling: An Efficient Online Path Profiling Framework for Just-In-Time Compilers , 2004, J. Instr. Level Parallelism.

[9]  Barton P. Miller,et al.  The Paradyn Parallel Performance Measurement Tool , 1995, Computer.

[10]  Martin Hirzel,et al.  Bursty Tracing: A Framework for Low-Overhead Temporal Profiling , 2001 .

[11]  Matthew Arnold,et al.  A Survey of Adaptive Optimization in Virtual Machines , 2005, Proceedings of the IEEE.

[12]  Mary Jean Harrold,et al.  Selective path profiling , 2002, PASTE '02.

[13]  Michael D. Bond,et al.  Targeted path profiling: lower overhead path profiling for staged dynamic optimization systems , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[14]  Thomas M. Conte,et al.  Accurate and practical profile-driven compilation using the profile buffer , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[15]  P. Geoffrey Lowney,et al.  Feedback directed optimization in Compaq's compilation tools for Alpha , 1999 .

[16]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[17]  Priti Shankar,et al.  The Compiler Design Handbook: Optimizations and Machine Code Generation , 2002, The Compiler Design Handbook.

[18]  Thomas Ball,et al.  Efficiently counting program events with support for on-line queries , 1994, TOPL.

[19]  E. Duesterwald,et al.  Software profiling for hot path prediction: less is more , 2000, SIGP.

[20]  B. Miller,et al.  The Paradyn Parallel Performance Measurement Tools , 1995 .

[21]  Scott A. Mahlke,et al.  Effective compiler support for predicated execution using the hyperblock , 1992, MICRO 25.

[22]  Rajiv Gupta,et al.  Profile-Guided Compiler Optimizations , 2002, The Compiler Design Handbook.

[23]  Matthew Arnold,et al.  Adaptive optimization in the Jalapeño JVM , 2000, OOPSLA '00.

[24]  Vivek Sarkar,et al.  A comparative study of static and profile-based heuristics for inlining , 2000, Dynamo.

[25]  James E. Smith,et al.  Relational profiling: enabling thread-level parallelism in virtual machines , 2000, MICRO 33.

[26]  Zheng Wang,et al.  System support for automatic profiling and optimization , 1997, SOSP.

[27]  Thomas Ball,et al.  Edge profiling versus path profiling: the showdown , 1998, POPL '98.

[28]  James R. Larus,et al.  Efficient path profiling , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[29]  Burzin A. Patel,et al.  Using branch handling hardware to support profile-driven optimization , 1994, Proceedings of MICRO-27. The 27th Annual IEEE/ACM International Symposium on Microarchitecture.

[30]  Scott A. Mahlke,et al.  The superblock: An effective technique for VLIW and superscalar compilation , 1993, The Journal of Supercomputing.

[31]  Lance M. Berc,et al.  Continuous profiling: where have all the cycles gone? , 1997, TOCS.

[32]  Matthew Arnold,et al.  A framework for reducing the cost of instrumented code , 2001, PLDI '01.

[33]  Gregory R. Ganger,et al.  Designing computer systems with MEMS-based storage , 2000, ASPLOS.

[34]  David W. Wall,et al.  Predicting program behavior using real or estimated profiles , 2004, SIGP.

[35]  Michael D. Smith,et al.  Ephemeral Instrumentation for Lightweight Program Profiling , 1997 .