Exploring the Potential of Performance Monitoring Hardware to Support Run-time Optimization

Date The final copy of this thesis has been examined by the signatories, and we find that both the content and the form meet acceptable presentation standards of scholarly work in the above mentioned discipline. Run-time optimization defines the process of dynamically modifying an applica-tion's characteristics to promote desirable execution behavior. Since there is a wealth of information available at runtime which is unavailable to static compiler analysis, run-time optimization has substantially more potential to fully utilize processor resources. A critical component of run-time optimization systems is the run-time profiler which must accurately capture specific aspects of application execution behavior while maintaining a low overhead. Unfortunately, most existing profiling approaches cannot meet these constraints and therefore cannot feasibly be deployed in a run-time optimization system. While modern microprocessors can collect run-time information through on-chip Hardware Performance Monitoring (HPM) support, it is not clear whether this technology can effectively guide a run-time optimization framework. To date the HPM information of various processor systems has almost solely been used in post-execution performance tools. This thesis evaluates the potential of performance monitoring hardware to support profiling for run-time optimization. The trade-offs in meeting the constraints imposed in a run-time environment are analyzed by evaluating various sampling rates and analysis techniques. Altogether, the thesis characterizes the amount of information available through PMU sampling as well as the extent in which compiler analysis can extend PMU information. Path profiling and code coverage analysis, important elements of run-time optimization, are evaluated to demonstrate the effectiveness of run-time profiling with hardware support. Dedication To my father, mother, and brother. v Acknowledgements First and foremost, I would like to thank my advisor Dan Connors for his guidance in this work. With his deep insights, generous advice, and continuous encouragement, I have learned and accomplished more than I could have imagined when I first began graduate school. I look forward to continuing my research with him as I pursue a Ph.D degree. I would like to acknowledge Andrew Pleszkun and Manish Vachharajani, members of my defense committee, for their valuable advice and suggestions. I express my thanks to the entire DRACO research group. Matthew Iyer was instrumental to the development of ideas as well as the implementation of the work in this thesis. Alex Settle mentored me early on and was extremely helpful in teaching me research tools and skills. Vijay Janapa Reddi provided me with a lot of …

[1]  Derek Bruening,et al.  An infrastructure for adaptive dynamic optimization , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..

[2]  Ronitt Rubinfeld,et al.  On the learnability of discrete distributions , 1994, STOC '94.

[3]  Michael Franz,et al.  Continuous program optimization , 1999 .

[4]  Stephane Eranian,et al.  The perfmon2 interface specification , 2005 .

[5]  Harish Patil,et al.  Ispike: a post-link optimizer for the Intel/spl reg/ Itanium/spl reg/ architecture , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[6]  Wen-mei W. Hwu,et al.  A hardware mechanism for dynamic extraction and relayout of program hot spots , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[7]  Jeffrey K. Hollingsworth,et al.  An API for Runtime Code Patching , 2000, Int. J. High Perform. Comput. Appl..

[8]  David W. Wall Predicting program behavior using real or estimated profiles , 1991, PLDI '91.

[9]  Wei-Chung Hsu,et al.  Design and Implementation of a Lightweight Dynamic Optimization System , 2004, J. Instr. Level Parallelism.

[10]  Brinkley Sprunt,et al.  Pentium 4 Performance-Monitoring Features , 2002, IEEE Micro.

[11]  James R. Larus,et al.  Optimally profiling and tracing programs , 1992, POPL '92.

[12]  Wei-Chung Hsu,et al.  Dynamic trace selection using performance monitoring hardware sampling , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..

[13]  R. Wisniewski,et al.  Performance and Environment Monitoring for Whole-System Characterization and Optimization , 2004 .

[14]  Thomas Ball,et al.  Edge profiling versus path profiling: the showdown , 1998, POPL '98.

[15]  Karl Pettis,et al.  Profile guided code positioning , 1990, PLDI '90.

[16]  James R. Larus,et al.  Efficient path profiling , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[17]  Wen-mei W. Hwu,et al.  Trace Selection For Compiling Large C Application Programs To Microcode , 1988, [1988] Proceedings of the 21st Annual Workshop on Microprogramming and Microarchitecture - MICRO '21.

[18]  Sorin Lerner,et al.  Mojo: A Dynamic Optimization System , 2000 .

[19]  Joseph A. Fisher,et al.  Trace Scheduling: A Technique for Global Microcode Compaction , 1981, IEEE Transactions on Computers.

[20]  Harish Patil,et al.  Ispike: A Post-link Optimizer for the Intel®Itanium®Architecture , 2004, CGO.

[21]  Y. N. Srikant,et al.  A programmable hardware path profiler , 2005, International Symposium on Code Generation and Optimization.

[22]  Lance M. Berc,et al.  Continuous profiling: where have all the cycles gone? , 1997, ACM Trans. Comput. Syst..

[23]  Sandhya Dwarkadas,et al.  Characterizing and predicting program behavior and its variability , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.

[24]  Michael D. Bond,et al.  Practical path profiling for dynamic optimizers , 2005, International Symposium on Code Generation and Optimization.

[25]  John C. Gyllenhaal,et al.  A hardware-driven profiling scheme for identifying program hot spots to support runtime optimization , 1999, ISCA.

[26]  Scott A. Mahlke,et al.  Using profile information to assist classic code optimizations , 1991, Softw. Pract. Exp..

[27]  E. Duesterwald,et al.  Software profiling for hot path prediction: less is more , 2000, SIGP.

[28]  Amitabh Srivastava,et al.  Analysis Tools , 2019, Public Transportation Systems.

[29]  Wei-Chung Hsu,et al.  The Performance of Runtime Data Cache Prefetching in a Dynamic Optimization System , 2003, MICRO.

[30]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[31]  Scott A. Mahlke,et al.  The superblock: An effective technique for VLIW and superscalar compilation , 1993, The Journal of Supercomputing.

[32]  Jeffrey Dean,et al.  ProfileMe: hardware support for instruction-level profiling on out-of-order processors , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[33]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[34]  Matthias Hauswirth,et al.  Vertical profiling: understanding the behavior of object-priented applications , 2004, OOPSLA.

[35]  Paolo Faraboschi,et al.  DELI: a new run-time control point , 2002, MICRO.

[36]  Michael D. Bond,et al.  Targeted path profiling: lower overhead path profiling for staged dynamic optimization systems , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[37]  Brad Calder,et al.  Phase tracking and prediction , 2003, ISCA '03.

[38]  Matthias Hauswirth,et al.  Using Hardware Performance Monitors to Understand the Behavior of Java Applications , 2004, Virtual Machine Research and Technology Symposium.

[39]  Wen-mei W. Hwu,et al.  Inline function expansion for compiling C programs , 1989, PLDI '89.