Non-intrusive dynamic application profiler for detailed loop execution characterization

Application profiling - the process of monitoring an application to determine the frequency of execution within specific regions - is an essential step within the design process for many software and hardware systems. In this paper, we present an efficient innovative, non-intrusive dynamic application profiler (DAProf) capable of profiling an executing application by monitoring the application's short backwards branches and providing detailed profiling statistics for characterizing loop execution behavior. DAProf is ideally suited for hardware/software partitioning approaches in which detailed loop execution information is needed to provide accurate performance estimates. DAProf provides a profiling accuracy of greater than 90% with only an 11% area overhead compared to a small ARM9.

[1]  John C. Gyllenhaal,et al.  A hardware-driven profiling scheme for identifying program hot spots to support runtime optimization , 1999, ISCA.

[2]  Sujit Dey,et al.  Common-case computation: a high-level technique for power and performance optimization , 1999, DAC '99.

[3]  James R. Larus,et al.  EEL: machine-independent executable editing , 1995, PLDI '95.

[4]  Fadi J. Kurdahi,et al.  A compiler framework for mapping applications to a coarse-grained reconfigurable computer architecture , 2001, CASES '01.

[5]  Michael Gschwind,et al.  Dynamic Binary Translation and Optimization , 2001, IEEE Trans. Computers.

[6]  Ibrahim N. Hajj,et al.  Energy and performance improvements in microprocessor design using a loop cache , 1999, Proceedings 1999 IEEE International Conference on Computer Design: VLSI in Computers and Processors (Cat. No.99CB37040).

[7]  John Yates,et al.  FX!32 a profile-directed binary translator , 1998, IEEE Micro.

[8]  Zheng Wang,et al.  System support for automatic profiling and optimization , 1997, SOSP.

[9]  Kim M. Hazelwood,et al.  A dynamic binary instrumentation engine for the ARM architecture , 2006, CASES '06.

[10]  Brad Calder,et al.  Value profiling , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[11]  Ada Diaconescu,et al.  Automatic performance management in component based software systems , 2004, International Conference on Autonomic Computing, 2004. Proceedings..

[12]  Frank Vahid,et al.  Exploiting Fixed Programs in Embedded Systems: A Loop Cache Example , 2002, IEEE Computer Architecture Letters.

[13]  Frank Vahid,et al.  Warp Processors , 2004, ACM Trans. Design Autom. Electr. Syst..

[14]  Susan L. Graham,et al.  Gprof: A call graph execution profiler , 1982, SIGPLAN '82.

[15]  S. Turner,et al.  Performance Analysis Using the MIPS R10000 Performance Counters , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[16]  Lance M. Berc,et al.  Continuous profiling: where have all the cycles gone? , 1997, TOCS.

[17]  John Arends,et al.  Instruction fetch energy reduction using loop caches for embedded applications with small tight loops , 1999, ISLPED '99.

[18]  John F. Keane,et al.  A compiled accelerator for biological cell signaling simulations , 2004, FPGA '04.

[19]  Frank Vahid,et al.  Frequent loop detection using efficient nonintrusive on-chip hardware , 2005, IEEE Transactions on Computers.

[20]  Daniel M. Yellin,et al.  Competitive algorithms for the dynamic selection of component implementations , 2003, IBM Syst. J..

[21]  Kees A. Vissers,et al.  Optimized generation of data-path from C codes for FPGAs , 2005, Design, Automation and Test in Europe.

[22]  Norman Rubin,et al.  A Profile-Directed Binary Translator , 1998 .

[23]  Frank Vahid,et al.  Frequent loop detection using efficient non-intrusive on-chip hardware , 2003, CASES '03.

[24]  Brinkley Sprunt,et al.  Pentium 4 Performance-Monitoring Features , 2002, IEEE Micro.

[25]  Jeffrey Dean,et al.  ProfileMe: hardware support for instruction-level profiling on out-of-order processors , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[26]  Luca Benini,et al.  Automatic source code specialization for energy reduction , 2001, ISLPED '01.

[27]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .