Online strategies for high-performance power-aware thread execution on emerging multiprocessors

Granularity control is an effective means for trading power consumption with performance on dense shared memory multiprocessors, such as multi-SMT and multi-CMP systems. With granularity control, the number of threads used to execute an application, or part of an application, is changed, thereby also changing the amount of work done by each active thread. In this paper, we analyze the energy/performance trade-off of varying thread granularity in parallel benchmarks written for shared memory systems. We use physical experimentation on a real multi-SMT system and a power estimation model based on the die areas of processor components and component activity factors obtained from a hardware event monitor. We also present HPPATCH, a runtime algorithm for live tuning of thread granularity, which attempts to simultaneously reduce both execution time and processor power consumption

[1]  Jian Li,et al.  Power-Performance Implications of Thread-level Parallelism on Chip Multiprocessors , 2005, IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005..

[2]  Michael Voss,et al.  Runtime empirical selection of loop schedulers on hyperthreaded SMPs , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[3]  Balaram Sinharoy,et al.  IBM Power5 chip: a dual-core multithreaded processor , 2004, IEEE Micro.

[4]  Marc Snir,et al.  GETTING UP TO SPEED THE FUTURE OF SUPERCOMPUTING , 2004 .

[5]  Norman P. Jouppi,et al.  Conjoined-Core Chip Multiprocessing , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[6]  David F. Heidel,et al.  An Overview of the BlueGene/L Supercomputer , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[7]  Kunle Olukotun,et al.  Niagara: a 32-way multithreaded Sparc processor , 2005, IEEE Micro.

[8]  David K. Lowenthal,et al.  Just In Time Dynamic Voltage Scaling: Exploiting Inter-Node Slack to Save Energy in MPI Programs , 2005 .

[9]  Margaret Martonosi,et al.  Runtime Power Monitoring in High-End Processors: Methodology and Empirical Data , 2003, MICRO.

[10]  Frank Bellosa,et al.  Process cruise control: event-driven clock scaling for dynamic power management , 2002, CASES '02.

[11]  Rong Ge,et al.  Improvement of power-performance efficiency for high-end computing , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[12]  Andrew Gilliam Tucker,et al.  Efficient Scheduling on Multiprogrammed Shared-Memory Multiprocessors , 1994 .

[13]  Kevin Skadron,et al.  Performance, energy, and thermal considerations for SMT and CMP architectures , 2005, 11th International Symposium on High-Performance Computer Architecture.

[14]  Wei Zhang,et al.  Runtime code parallelization for on-chip multiprocessors , 2003, 2003 Design, Automation and Test in Europe Conference and Exhibition.

[15]  M. Singh,et al.  A dual-core 64 b UltraSPARC microprocessor for dense server applications , 2004, 2004 IEEE International Solid-State Circuits Conference (IEEE Cat. No.04CH37519).

[16]  Ahmad Zmily,et al.  Energy-efficient and high-performance instruction fetch using a block-aware ISA , 2005, ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005..

[17]  David K. Lowenthal,et al.  Just In Time Dynamic Voltage Scaling: Exploiting Inter-Node Slack to Save Energy in MPI Programs , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[18]  Brad Calder,et al.  Basic block distribution analysis to find periodic behavior and simulation points in applications , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.

[19]  Kevin Skadron,et al.  Using performance counters for runtime temperature sensing in high-performance processors , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[20]  Rohit Bhatia,et al.  Montecito: a dual-core, dual-thread Itanium processor , 2005, IEEE Micro.