Analysis of path profiling information generated with performance monitoring hardware

Even with the breakthroughs in semiconductor technology that enables billion transistor designs, hardware-based architecture paradigms alone cannot substantially improve processor performance. The challenge in realizing the full potential of these future machines is to find ways to adapt program behavior to application needs and processor resources. As such, run-time optimization has a distinct role in future high performance systems. However, as these systems are dependent on accurate, fine-grain profile information, traditional approaches to collecting profiles at run-time result in significant slowdowns during program execution. A novel approach to low-overhead profiling is to exploit hardware performance monitoring units (PMUs) present in modern microprocessors. The Itanium-2 PMU can periodically sample the last few taken branches in an executing program and this information can be used to recreate partial paths of execution. With compiler-aided analysis, the partial paths can be correlated into full paths. As statistically hot paths are most likely to occur in PMU samples, even infrequent sampling can accurately identify these paths. While traditional path profiling techniques carry a high overhead, a PMU-based path profiler represents an effective lightweight profiling alternative. This paper characterizes the PMU-based path information and demonstrates the construction of such a PMU-based path profiler for a run-time system.

[1]  Lance M. Berc,et al.  Continuous profiling: where have all the cycles gone? , 1997, TOCS.

[2]  Wei-Chung Hsu,et al.  Dynamic trace selection using performance monitoring hardware sampling , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..

[3]  Scott A. Mahlke,et al.  The superblock: An effective technique for VLIW and superscalar compilation , 1993, The Journal of Supercomputing.

[4]  Thomas Ball,et al.  Edge profiling versus path profiling: the showdown , 1998, POPL '98.

[5]  Michael Franz,et al.  Continuous Program Optimization: Design and Evaluation , 2001, IEEE Trans. Computers.

[6]  Michael Franz,et al.  Continuous program optimization , 1999 .

[7]  James R. Larus,et al.  Efficient path profiling , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[8]  David W. Wall Predicting program behavior using real or estimated profiles , 1991, PLDI '91.

[9]  Wei-Chung Hsu,et al.  Design and Implementation of a Lightweight Dynamic Optimization System , 2004, J. Instr. Level Parallelism.

[10]  Michael D. Bond,et al.  Practical path profiling for dynamic optimizers , 2005, International Symposium on Code Generation and Optimization.

[11]  Jack W. Davidson,et al.  Profile guided code positioning , 1990, SIGP.

[12]  Michael D. Bond,et al.  Targeted path profiling: lower overhead path profiling for staged dynamic optimization systems , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[13]  Wen-mei W. Hwu,et al.  Inline function expansion for compiling C programs , 1989, PLDI '89.

[14]  Wen-mei W. Hwu,et al.  A hardware mechanism for dynamic extraction and relayout of program hot spots , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[15]  Amitabh Srivastava,et al.  Analysis Tools , 2019, Public Transportation Systems.

[16]  David W. Wall,et al.  Predicting program behavior using real or estimated profiles , 2004, SIGP.

[17]  James R. Larus,et al.  Optimally profiling and tracing programs , 1994, TOPL.

[18]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.