Taming Hardware Event Samples for Precise and Versatile Feedback Directed Optimizations

Feedback-directed optimization (FDO) is effective in improving application runtime performance, but has not been widely adopted due to the tedious dual-compilation model, the difficulties in generating representative training data sets, and the high runtime overhead of profile collection. The use of hardware-event sampling overcomes these drawbacks by providing a lightweight approach to collect execution profiles in the production environment, which naturally consumes representative input. Yet, hardware event samples are typically not precise at the instruction or basic-block granularity. These inaccuracies lead to missed performance when compared to instrumentation-based FDO. In this paper, we use Performance Monitoring Unit (PMU)-based sampling to collect the instruction frequency profiles. By collecting profiles using multiple events, and applying heuristics to predict the accuracy, we improve the accuracy of the profile. We also show how emerging techniques can be used to further improve the accuracy of the sample-based profile. Additionally, these emerging techniques are used to collect value profiles, as well as to assist a lightweight interprocedural optimizer. All these profiles are represented in a portable form, thus they can be used across different platforms. We demonstrate that sampling-based FDO can achieve an average of 92 percent of the performance gains obtained using instrumentation-based exact profiles for both SPEC CINT2000 and CINT2006 benchmarks. The overhead of collection is only 0.93 percent on average, while compiler-based instrumentation incurs 2.0-351.5 percent overhead (and 10x overhead on an industrial web search application).

[1]  Lance M. Berc,et al.  Continuous profiling: where have all the cycles gone? , 1997, TOCS.

[2]  Avi Mendelson,et al.  Can program profiling support value prediction? , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[3]  James R. Larus,et al.  Optimally profiling and tracing programs , 1994, TOPL.

[4]  Brad Calder,et al.  Value profiling , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[5]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[6]  Thomas R. Gross,et al.  Online optimizations driven by hardware performance monitoring , 2007, PLDI '07.

[7]  Gadi Haber,et al.  Complementing Missing and Inaccurate Profiling Using a Minimum Cost Circulation Algorithm , 2008, HiPEAC.

[8]  Zheng Wang,et al.  System support for automatic profiling and optimization , 1997, SOSP.

[9]  Nathan Froyd,et al.  Low-overhead call path profiling of unmodified, optimized code , 2005, ICS '05.

[10]  James R. Larus,et al.  Exploiting hardware performance counters with flow and context sensitive profiling , 1997, PLDI '97.

[11]  James R. Larus,et al.  Static branch frequency and program profile analysis , 1994, MICRO 27.

[12]  Michael Burrows,et al.  Efficient and Flexible Value Sampling , 2000, ASPLOS.

[13]  Zheng Wang,et al.  Profile-Based Optimization with Statistical Profiles , 1997 .

[14]  Nathan R. Tallent,et al.  Analyzing lock contention in multithreaded applications , 2010, PPoPP '10.

[15]  Wenguang Chen,et al.  Taming hardware event samples for FDO compilation , 2010, CGO '10.

[16]  Jeffrey Dean,et al.  ProfileMe: hardware support for instruction-level profiling on out-of-order processors , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[17]  M. Merten,et al.  A hardware-driven profiling scheme for identifying program hot spots to support runtime optimization , 1999, Proceedings of the 26th International Symposium on Computer Architecture (Cat. No.99CB36367).

[18]  Matthew Arnold,et al.  A framework for reducing the cost of instrumented code , 2001, PLDI '01.

[19]  Matthias Hauswirth,et al.  Evaluating the accuracy of Java profilers , 2010, PLDI '10.

[20]  Michael D. Smith,et al.  Ephemeral Instrumentation for Lightweight Program Profiling , 1997 .

[21]  Dehao Chen,et al.  Feedback-Directed Optimizations in GCC with Estimated Edge Profiles from Hardware Event Sampling , 2008 .

[22]  Mauricio J. Serrano,et al.  Prefetch injection based on hardware monitoring and object metadata , 2004, PLDI '04.

[23]  David Xinliang Li,et al.  Lightweight feedback-directed cross-module optimization , 2010, CGO '10.

[24]  James R. Larus,et al.  Efficient path profiling , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[25]  Sally A. McKee,et al.  Can hardware performance counters be trusted? , 2008, 2008 IEEE International Symposium on Workload Characterization.