Non-intrusive dynamic application profiler for detailed loop execution characterization
暂无分享,去创建一个
[1] John C. Gyllenhaal,et al. A hardware-driven profiling scheme for identifying program hot spots to support runtime optimization , 1999, ISCA.
[2] Sujit Dey,et al. Common-case computation: a high-level technique for power and performance optimization , 1999, DAC '99.
[3] James R. Larus,et al. EEL: machine-independent executable editing , 1995, PLDI '95.
[4] Fadi J. Kurdahi,et al. A compiler framework for mapping applications to a coarse-grained reconfigurable computer architecture , 2001, CASES '01.
[5] Michael Gschwind,et al. Dynamic Binary Translation and Optimization , 2001, IEEE Trans. Computers.
[6] Ibrahim N. Hajj,et al. Energy and performance improvements in microprocessor design using a loop cache , 1999, Proceedings 1999 IEEE International Conference on Computer Design: VLSI in Computers and Processors (Cat. No.99CB37040).
[7] John Yates,et al. FX!32 a profile-directed binary translator , 1998, IEEE Micro.
[8] Zheng Wang,et al. System support for automatic profiling and optimization , 1997, SOSP.
[9] Kim M. Hazelwood,et al. A dynamic binary instrumentation engine for the ARM architecture , 2006, CASES '06.
[10] Brad Calder,et al. Value profiling , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.
[11] Ada Diaconescu,et al. Automatic performance management in component based software systems , 2004, International Conference on Autonomic Computing, 2004. Proceedings..
[12] Frank Vahid,et al. Exploiting Fixed Programs in Embedded Systems: A Loop Cache Example , 2002, IEEE Computer Architecture Letters.
[13] Frank Vahid,et al. Warp Processors , 2004, ACM Trans. Design Autom. Electr. Syst..
[14] Susan L. Graham,et al. Gprof: A call graph execution profiler , 1982, SIGPLAN '82.
[15] S. Turner,et al. Performance Analysis Using the MIPS R10000 Performance Counters , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.
[16] Lance M. Berc,et al. Continuous profiling: where have all the cycles gone? , 1997, TOCS.
[17] John Arends,et al. Instruction fetch energy reduction using loop caches for embedded applications with small tight loops , 1999, ISLPED '99.
[18] John F. Keane,et al. A compiled accelerator for biological cell signaling simulations , 2004, FPGA '04.
[19] Frank Vahid,et al. Frequent loop detection using efficient nonintrusive on-chip hardware , 2005, IEEE Transactions on Computers.
[20] Daniel M. Yellin,et al. Competitive algorithms for the dynamic selection of component implementations , 2003, IBM Syst. J..
[21] Kees A. Vissers,et al. Optimized generation of data-path from C codes for FPGAs , 2005, Design, Automation and Test in Europe.
[22] Norman Rubin,et al. A Profile-Directed Binary Translator , 1998 .
[23] Frank Vahid,et al. Frequent loop detection using efficient non-intrusive on-chip hardware , 2003, CASES '03.
[24] Brinkley Sprunt,et al. Pentium 4 Performance-Monitoring Features , 2002, IEEE Micro.
[25] Jeffrey Dean,et al. ProfileMe: hardware support for instruction-level profiling on out-of-order processors , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.
[26] Luca Benini,et al. Automatic source code specialization for energy reduction , 2001, ISLPED '01.
[27] Trevor Mudge,et al. MiBench: A free, commercially representative embedded benchmark suite , 2001 .