Live, Runtime Phase Monitoring and Prediction on Real Systems with Application to Dynamic Power Management

Computer architecture has experienced a major paradigm shift from focusing only on raw performance to considering power-performance efficiency as the defining factor of the emerging systems. Along with this shift has come increased interest in workload characterization. This interest fuels two closely related areas of research. First, various studies explore the properties of workload variations and develop methods to identify and track different execution behavior, commonly referred to as "phase analysis". Second, a large complementary set of research studies dynamic, on-the-fly system management techniques that can adaptively respond to these differences in application behavior. Both of these lines of work have produced very interesting and widely useful results. Thus far, however, there exists only a weak link between these conceptually related areas, especially for real-system studies. Our work aims to strengthen this link by demonstrating a real-system implementation of a runtime phase predictor that works cooperatively with on-the-fly dynamic management. We describe a fully-functional deployed system that performs accurate phase predictions on running applications. The key insight of our approach is to draw from prior branch predictor designs to create a phase history table that guides predictions. To demonstrate the value of our approach, we implement a prototype system that uses it to guide dynamic voltage and frequency scaling. Our runtime phase prediction methodology achieves above 90% prediction accuracies for many of the experimented benchmarks. For highly variable applications, our approach can reduce mispredictions by more than 6X over commonly-used statistical approaches. Dynamic frequency and voltage scaling, when guided by our runtime phase predictor, achieves energy-delay product improvements as high as 34% for benchmarks with non-negligible variability, on average 7% better than previous methods and 18% better than a baseline unmanaged system

[1]  Brad Calder,et al.  Basic block distribution analysis to find periodic behavior and simulation points in applications , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.

[2]  Frank Bellosa,et al.  Process cruise control: event-driven clock scaling for dynamic power management , 2002, CASES '02.

[3]  Jeanine Cook,et al.  Examining performance differences in workload execution phases , 2001 .

[4]  R. Kotla,et al.  Characterizing the impact of different memory-intensity levels , 2004, IEEE International Workshop on Workload Characterization, 2004. WWC-7. 2004.

[5]  R. Todi SPEClite: using representative samples to reduce SPEC CPU2000 workload , 2001 .

[6]  Brad Calder,et al.  Automatically characterizing large scale program behavior , 2002, ASPLOS X.

[7]  Michael C. Huang,et al.  Dynamically Tuning Processor Resources with Adaptive Processing , 2003, Computer.

[8]  Ryan N. Rakvic,et al.  The Fuzzy Correlation between Code and Performance Predictability , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[9]  Margaret Martonosi,et al.  Long-term workload phases: duration predictions and applications to DVFS , 2005, IEEE Micro.

[10]  Diana Marculescu,et al.  Power aware microarchitecture resource scaling , 2001, Proceedings Design, Automation and Test in Europe. Conference and Exhibition 2001.

[11]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[12]  Margaret Martonosi,et al.  Identifying program power phase behavior using power vectors , 2003, 2003 IEEE International Conference on Communications (Cat. No.03CH37441).

[13]  John Paul Shen,et al.  Mitigating Amdahl's law through EPI throttling , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[14]  M. Martonosi,et al.  Detecting recurrent phase behavior under real-system variability , 2005, IEEE International. 2005 Proceedings of the IEEE Workload Characterization Symposium, 2005..

[15]  Wen-mei W. Hwu,et al.  Vacuum packing: extracting hardware-detected program phases for post-link optimization , 2002, MICRO.

[16]  Rajiv Kapoor,et al.  Pinpointing Representative Portions of Large Intel® Itanium® Programs with Dynamic Instrumentation , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[17]  Christopher J. Hughes,et al.  Saving energy with architectural and frequency adaptations for multimedia applications , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.

[18]  Kevin Skadron,et al.  Temperature-aware microarchitecture , 2003, ISCA '03.

[19]  Frank Bellosa,et al.  Event-Driven Energy Accounting for Dynamic Thermal Management , 2002 .

[20]  Margaret Martonosi,et al.  A dynamic compilation framework for controlling microprocessor energy and performance , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).

[21]  Chen Ding,et al.  Locality phase prediction , 2004, ASPLOS XI.

[22]  R. D. Valentine,et al.  The Intel Pentium M processor: Microarchitecture and performance , 2003 .

[23]  Sandhya Dwarkadas,et al.  Characterizing and predicting program behavior and its variability , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.

[24]  Sanjeev Kumar,et al.  Dynamic tracking of page miss ratio curve for memory management , 2004, ASPLOS XI.

[25]  Brad Calder,et al.  Phase tracking and prediction , 2003, ISCA '03.

[26]  Brad Calder,et al.  Transition phase classification and prediction , 2005, 11th International Symposium on High-Performance Computer Architecture.

[27]  Wen-mei W. Hwu,et al.  Vacuum packing: extracting hardware-detected program phases for post-link optimization , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[28]  Daniel A. Jiménez,et al.  Toward an evaluation infrastructure for power and energy optimizations , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[29]  Ramakrishna Kotla,et al.  Scheduling processor voltage and frequency in server and cluster systems , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.