论文信息 - Hardware-Based Pro ling: An E ective Technique for Pro le-Driven Optimization

Hardware-Based Pro ling: An E ective Technique for Pro le-Driven Optimization

Pro le-based optimizations can be used for instruction scheduling, loop scheduling, data preloading, function in-lining, and instruction cache performance enhancement. However, these techniques have not been embraced by software vendors because programs instrumented for pro ling run signi cantly slower, an awkward compile-run-recompile sequence is required, and a test input suite must be collected and validated for each program. This paper introduces hardware-based pro ling that uses traditional branch handling hardware to generate pro le information in real time. Techniques are presented for both one-level and two-level branch hardware organizations. The approach produces high accuracy with small slowdown in execution (0.4%{4.6%). This allows a program to be pro led while it is used, eliminating the need for a test input suite. With contemporary processors driven increasingly by compiler support, hardware-based pro ling is important for high-performance systems.

[1] Joseph A. Fisher,et al. Trace Scheduling: A Technique for Global Microcode Compaction , 1981, IEEE Transactions on Computers.

[2] Wen-mei W. Hwu,et al. Trace Selection For Compiling Large C Application Programs To Microcode , 1988, [1988] Proceedings of the 21st Annual Workshop on Microprogramming and Microarchitecture - MICRO '21.

[3] W. W. Hwu,et al. Achieving high instruction cache performance with an optimizing compiler , 1989, ISCA '89.

[4] Wen-mei W. Hwu,et al. Inline function expansion for compiling C programs , 1989, PLDI '89.

[5] Y. Patt,et al. Two-level adaptive training branch prediction , 1991, MICRO 24.

[6] Michael D. Smith,et al. Tracing with Pixie , 1991 .

[7] David W. Wall,et al. Predicting program behavior using real or estimated profiles , 2004, SIGP.

[8] Scott A. Mahlke,et al. Using profile information to assist classic code optimizations , 1991, Softw. Pract. Exp..

[9] James R. Larus,et al. Optimally profiling and tracing programs , 1992, POPL '92.

[10] Joseph A. Fisher,et al. Predicting conditional branch directions from previous runs of a program , 1992, ASPLOS V.

[11] Jr. William Yu-Wei Chen,et al. Data preload for superscalar and VLIW processors , 1993 .

[12] Donald B. Alpert,et al. Architecture of the Pentium microprocessor , 1993, IEEE Micro.

[13] James R. Larus,et al. Branch prediction for free , 1993, PLDI '93.

[14] Yale N. Patt,et al. A Comparison Of Dynamic Branch Predictors That Use Two Levels Of Branch History , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[15] Scott A. Mahlke,et al. Superblock formation using static program analysis , 1993, Proceedings of the 26th Annual International Symposium on Microarchitecture.

[16] Michael A. Harrison,et al. Accurate static estimators for program optimization , 1994, PLDI '94.

[17] S. Peter Song,et al. The PowerPC 604 RISC microprocessor. , 1994, IEEE Micro.

[18] James R. Larus,et al. Rewriting executable files to measure program behavior , 1994, Softw. Pract. Exp..

[19] D. Grunwald,et al. Fast & Accurate Instruction Fetch and Branch Prediction , 1994 .

[20] Scott A. Mahlke,et al. IMPACT: An Architectural Framework for Multiple-Instruction-Issue Processors , 1998, 25 Years ISCA: Retrospectives and Reprints.