Design and implementation of a configurable hardware profiler supporting path profiling and sampling

Profiling plays an important role in performance optimization, such as instruction set optimization, dynamic binary translation and so on. Unfortunately, profilers nowadays often lack in efficiency on two key attributes: accuracy and profiling time. In this paper, we introduce a configurable hardware path profiler deriving from previous work, based on the idea of sampling and path profiling. The profiler consists of three modules respectively for identifying branches, detecting paths and storing information. It can work with different processors loosely. It utilizes dynamic path profiling technique on instruction level to accurately obtain sensitive hot information of executing programs while supports multiple sampling policies to reduce profiling overheads. Through configuration, the profiler can perform different profiling policies and profile target programs continuously or discretely. Empirical experiments show that the profiler can reduce hardware timing to 6.4% and keep the accuracy up to 90%.

[1]  Xilinx Family An Overview of Multiple CAM Designs in Virtex Family Devices , 1999 .

[2]  John C. Gyllenhaal,et al.  A hardware-driven profiling scheme for identifying program hot spots to support runtime optimization , 1999, ISCA.

[3]  Altera Apex ii programmable logic device family data sheet , 2002 .

[4]  Y. N. Srikant,et al.  A programmable hardware path profiler , 2005, International Symposium on Code Generation and Optimization.

[5]  Lance M. Berc,et al.  Continuous profiling: where have all the cycles gone? , 1997, ACM Trans. Comput. Syst..

[6]  Michael D. Bond,et al.  Targeted path profiling: lower overhead path profiling for staged dynamic optimization systems , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[7]  Matthew Arnold,et al.  Collecting and exploiting high-accuracy call graph profiles in virtual machines , 2005, International Symposium on Code Generation and Optimization.

[8]  Michael D. Bond,et al.  Practical path profiling for dynamic optimizers , 2005, International Symposium on Code Generation and Optimization.

[9]  James R. Larus,et al.  Efficient path profiling , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[10]  J. Larus Whole program paths , 1999, PLDI '99.

[11]  Wendong Hu,et al.  NetBench: a benchmarking suite for network processors , 2001, IEEE/ACM International Conference on Computer Aided Design. ICCAD 2001. IEEE/ACM Digest of Technical Papers (Cat. No.01CH37281).

[12]  Toshiaki Yasue,et al.  An efficient online path profiling framework for Java just-in-time compilers , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.

[13]  Richard Johnson,et al.  Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization , 2003 .

[14]  Jeffrey Dean,et al.  ProfileMe: hardware support for instruction-level profiling on out-of-order processors , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[15]  Xiangyu Zhang,et al.  Extending path profiling across loop backedges and procedure boundaries , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[16]  Michael D. Bond,et al.  Continuous path and edge profiling , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).

[17]  Lance M. Berc,et al.  Continuous profiling: where have all the cycles gone? , 1997, TOCS.