HPS: hybrid profiling support

Key to understanding and optimizing complex applications is our ability to dynamically monitor executing programs with low overhead and high accuracy. Toward this end, we present HPS, a hybrid profiling support system. HPS employs a hardware/software approach to program sampling that transparently, efficiently, and dynamically samples an executing instruction stream. Our system is an extension and application of dynamic instruction stream editing (DISE), a hardware technique that macro-expands instructions in the pipeline decode stage at runtime. HPS toggles profiling to sample the executing program as required by the profile consumer, e.g. a dynamic optimizer. Our system requires few hardware resources and introduces no "basic" overhead - the execution of instructions that triggers profiling. We use HPS to investigate the tradeoffs between overhead and accuracy of different profile types as well as different profiling schemes. In particular, we empirically evaluate hot data stream, hot call pair, and hot method identification using a number of parameterizations of bursty tracing, a popular sampling scheme used in dynamic optimization systems.

[1]  Sun Microsystems The Java HotSpot TM Virtual Machine Technical White Paper , .

[2]  Martin Hirzel,et al.  Bursty Tracing: A Framework for Low-Overhead Temporal Profiling , 2001 .

[3]  Matthew Arnold,et al.  Adaptive optimization in the Jalapeño JVM , 2000, OOPSLA '00.

[4]  Trishul M. Chilimbi Efficient representations and abstractions for quantifying and exploiting data reference locality , 2001, PLDI '01.

[5]  Jeffrey K. Hollingsworth,et al.  Efficient instrumentation for code coverage testing , 2002, ISSTA '02.

[6]  John C. Gyllenhaal,et al.  A hardware-driven profiling scheme for identifying program hot spots to support runtime optimization , 1999, ISCA.

[7]  Amir Roth,et al.  A DISE implementation of dynamic code decompression , 2003, LCTES.

[8]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[9]  Michael D. Smith,et al.  Ephemeral Instrumentation for Lightweight Program Profiling , 1997 .

[10]  Gurindar S. Sohi,et al.  A programmable co-processor for profiling , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[11]  Peter Feller,et al.  Value Profiling for Instructions and Memory Locations , 1998 .

[12]  Jeffrey Dean,et al.  ProfileMe: hardware support for instruction-level profiling on out-of-order processors , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[13]  Matthias Hauswirth,et al.  Low-overhead memory leak detection using adaptive statistical profiling , 2004, ASPLOS XI.

[14]  Brian J. N. Wylie,et al.  Memory Profiling using Hardware Counters , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[15]  Lance M. Berc,et al.  Continuous profiling: where have all the cycles gone? , 1997, ACM Trans. Comput. Syst..

[16]  Matthew Arnold,et al.  A Survey of Adaptive Optimization in Virtual Machines , 2005, Proceedings of the IEEE.

[17]  Amir Roth,et al.  DISE: a programmable macro engine for customizing applications , 2003, ISCA '03.

[18]  Amir Roth,et al.  Low-overhead interactive debugging via dynamic instrumentation with DISE , 2005, 11th International Symposium on High-Performance Computer Architecture.

[19]  Matthew Arnold,et al.  A framework for reducing the cost of instrumented code , 2001, PLDI '01.

[20]  Michael I. Jordan,et al.  Bug isolation via remote program sampling , 2003, PLDI.

[21]  Martin Hirzel,et al.  Dynamic hot data stream prefetching for general-purpose programs , 2002, PLDI '02.

[22]  Frank Vahid,et al.  Frequent loop detection using efficient non-intrusive on-chip hardware , 2003, CASES '03.

[23]  James E. Smith,et al.  Rapid profiling via stratified sampling , 2001, ISCA 2001.

[24]  Chandra Krintz,et al.  Phase-aware remote profiling , 2005, International Symposium on Code Generation and Optimization.

[25]  James E. Smith,et al.  Relational profiling: enabling thread-level parallelism in virtual machines , 2000, MICRO 33.

[26]  James M. Stichnoth,et al.  Practicing JUDO: Java under dynamic optimizations , 2000, PLDI '00.