Selecting software phase markers with code structure analysis

Most programs are repetitive, where similar behavior can be seen at different execution times. Algorithms have been proposed that automatically group similar portions of a program's execution into phases, where samples of execution in the same phase have homogeneous behavior and similar resource requirements. In this paper, we present an automated profiling approach to identify code locations whose executions correlate with phase changes. These ''software phase markers" can be used to easily detect phase changes across different inputs to a program without hardware support. Our approach builds a combined hierarchical procedure call and loop graph to represent a program's execution, where each edge also tracks the max, average, and standard deviation in hierarchical execution variability on paths from that edge. We search this annotated call-loop graph for instructions in the binary that accurately identify the start of unique stable behaviors across different inputs. We show that our phase markers can be used to accurately partition execution into units of repeating homogeneous behavior by counting execution cycles and data cache hits. We also compare the use of our software markers to prior work on guiding data cache reconfiguration using data-reuse markers. Finally, we show that the phase markers can be used to partition the program's execution at code transitions to pick accurately simulation points for SimPoint. When simulation points are defined in terms of phase markers, they can potentially be re-used across inputs, compiler optimizations, and different instruction set architectures for the same source code.

[1]  James E. Smith,et al.  Managing multi-configuration hardware via dynamic working set analysis , 2002, ISCA.

[2]  Wei Liu,et al.  EXPERT: expedited simulation exploiting program behavior repetition , 2004, ICS '04.

[3]  John C. Gyllenhaal,et al.  An Architectural Framework for Runtime Optimization , 2001, IEEE Trans. Computers.

[4]  David W. Wall,et al.  A practical system fljr intermodule code optimization at link-time , 1993 .

[5]  Michael C. Huang,et al.  Positional adaptation of processors: application to energy reduction , 2003, ISCA '03.

[6]  A.S. Dhodapkar,et al.  Dynamic microarchitecture adaptation via co-designed virtual machines , 2002, 2002 IEEE International Solid-State Circuits Conference. Digest of Technical Papers (Cat. No.02CH37315).

[7]  Brad Calder,et al.  Picking statistically valid and early simulation points , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.

[8]  Craig G. Nevill-Manning,et al.  Compression and Explanation Using Hierarchical Grammars , 1997, Comput. J..

[9]  Chen Ding,et al.  Locality phase prediction , 2004, ASPLOS XI.

[10]  Wen-mei W. Hwu,et al.  Vacuum packing: extracting hardware-detected program phases for post-link optimization , 2002, MICRO.

[11]  Margaret Martonosi,et al.  Runtime power monitoring in high-end processors: methodology and empirical data , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[12]  Margaret Martonosi,et al.  Identifying program power phase behavior using power vectors , 2003, 2003 IEEE International Conference on Communications (Cat. No.03CH37441).

[13]  Sandhya Dwarkadas,et al.  Characterizing and predicting program behavior and its variability , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.

[14]  Brad Calder,et al.  SimPoint 3.0: Faster and More Flexible Program Phase Analysis , 2005, J. Instr. Level Parallelism.

[15]  Peter J. Denning,et al.  Properties of the working-set model , 1972, CACM.

[16]  Lieven Eeckhout,et al.  Method-level phase behavior in java workloads , 2004, OOPSLA.

[17]  James E. Smith,et al.  Comparing program phase detection techniques , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[18]  Brad Calder,et al.  Motivation for Variable Length Intervals and Hierarchical Phase Behavior , 2005, IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005..

[19]  R. Balasubramonian,et al.  Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures , 2000, Proceedings 33rd Annual IEEE/ACM International Symposium on Microarchitecture. MICRO-33 2000.

[20]  R. D. Barnes,et al.  An Architectural Framework for Run-Time Optimization , 2001 .

[21]  V. T. Rajan,et al.  Phase Shift Detection: A Problem Classification , 2003 .

[22]  Brad Calder,et al.  Structures for phase classification , 2004, IEEE International Symposium on - ISPASS Performance Analysis of Systems and Software, 2004.

[23]  Brad Calder,et al.  Phase tracking and prediction , 2003, ISCA '03.

[24]  Brad Calder,et al.  Transition phase classification and prediction , 2005, 11th International Symposium on High-Performance Computer Architecture.

[25]  Brad Calder,et al.  Automatically characterizing large scale program behavior , 2002, ASPLOS X.

[26]  A. Cohen,et al.  Wavelets and Multiscale Signal Processing , 1995 .

[27]  Robert Muth,et al.  alto: a link‐time optimizer for the Compaq Alpha , 2001 .

[28]  Brad Calder,et al.  Basic block distribution analysis to find periodic behavior and simulation points in applications , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.