Prediction models for multi-dimensional power-performance optimization on many cores

Power has become a primary concern for HPC systems. Dynamic voltage and frequency scaling (DVFS) and dynamic concurrency throttling (DCT) are two software tools (or knobs) for reducing the dynamic power consumption of HPC systems. To date, few works have considered the synergistic integration of DVFS and DCT in performance-constrained systems, and, to the best of our knowledge, no prior research has developed application-aware simultaneous DVFS and DCT controllers in real systems and parallel programming frameworks. We present a multi-dimensional, online performance predictor, which we deploy to address the problem of simultaneous runtime optimization of DVFS and DCT on multi-core systems. We present results from an implementation of the predictor in a runtime library linked to the Intel OpenMP environment and running on an actual dual-processor quad-core system. We show that our predictor derives near-optimal settings of the power-aware program adaptation knobs that we consider. Our overall framework achieves significant reductions in energy (19% mean) and ED2 (40% mean), through simultaneous power savings (6% mean) and performance improvements (14% mean). We also find that our framework outperforms earlier solutions that adapt only DVFS or DCT, as well as one that sequentially applies DCT then DVFS. Further, our results indicate that prediction-based schemes for runtime adaptation compare favorably and typically improve upon heuristic search-based approaches in both performance and energy savings.

[1]  Dimitrios S. Nikolopoulos,et al.  Online power-performance adaptation of multithreaded programs using hardware event-based prediction , 2006, ICS '06.

[2]  Jack J. Dongarra,et al.  A Scalable Cross-Platform Infrastructure for Application Performance Tuning Using Hardware Counters , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[3]  Frank Bellosa,et al.  Balancing power consumption in multiprocessor systems , 2006, EuroSys.

[4]  Jian Li,et al.  Dynamic power-performance adaptation of parallel computation on chip multiprocessors , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[5]  Rong Ge,et al.  High-performance, power-aware distributed computing for scientific applications , 2005, Computer.

[6]  Rong Ge,et al.  CPU MISER: A Performance-Directed, Run-Time System for Power-Aware Clusters , 2007, 2007 International Conference on Parallel Processing (ICPP 2007).

[7]  Margaret Martonosi,et al.  An Analysis of Efficient Multi-Core Global Power Management Policies: Maximizing Performance for a Given Power Budget , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[8]  Hiroshi Nakamura,et al.  Improving fairness, throughput and energy-efficiency on a chip multiprocessor through DVFS , 2007, CARN.

[9]  Dirk Grunwald,et al.  Methods for modeling resource contention on simultaneous multithreading processors , 2005, 2005 International Conference on Computer Design.

[10]  Ricardo Bianchini,et al.  Limiting the power consumption of main memory , 2007, ISCA '07.

[11]  Feng Pan,et al.  Analyzing the Energy-Time Trade-Off in High-Performance Computing Applications , 2007, IEEE Transactions on Parallel and Distributed Systems.

[12]  Mani Azimi,et al.  Integration Challenges and Tradeoffs for Terascale Architectures , 2007 .

[13]  James E. Smith,et al.  A first-order superscalar processor model , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[14]  Wu-chun Feng,et al.  A Power-Aware Run-Time System for High-Performance Computing , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[15]  Margaret Martonosi,et al.  Formal online methods for voltage/frequency control in multiple clock domain microprocessors , 2004, ASPLOS XI.

[16]  David M. Brooks,et al.  Accurate and efficient regression modeling for microarchitectural performance and power prediction , 2006, ASPLOS XII.

[17]  Gurindar S. Sohi,et al.  A Case for an Over-provisioned Multicore System: Energy Efficient Processing of Multithreaded Programs , 2007 .

[18]  Ivan B. Ganev,et al.  Re-architecting VMMs for Multicore Systems : The Sidecore Approach , 2007 .

[19]  Kevin Skadron,et al.  CMP design space exploration subject to physical constraints , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[20]  Margaret Martonosi,et al.  Live, Runtime Phase Monitoring and Prediction on Real Systems with Application to Dynamic Power Management , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[21]  Michael Voss,et al.  Runtime empirical selection of loop schedulers on hyperthreaded SMPs , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[22]  References , 1971 .

[23]  Kirk W. Cameron,et al.  Tempest: A portable tool to identify hot spots in parallel code , 2007, 2007 International Conference on Parallel Processing (ICPP 2007).

[24]  Dimitrios S. Nikolopoulos,et al.  Prediction-Based Power-Performance Adaptation of Multithreaded Scientific Codes , 2008, IEEE Transactions on Parallel and Distributed Systems.

[25]  Laxmikant V. Kalé,et al.  Adaptive MPI , 2003, LCPC.

[26]  Margaret Martonosi,et al.  Runtime Power Monitoring in High-End Processors: Methodology and Empirical Data , 2003, MICRO.

[27]  Yuanyuan Zhou,et al.  Managing energy-performance tradeoffs for multithreaded applications on multiprocessor architectures , 2007, SIGMETRICS '07.

[28]  Frank Bellosa,et al.  Process cruise control: event-driven clock scaling for dynamic power management , 2002, CASES '02.

[29]  David K. Lowenthal,et al.  Minimizing execution time in MPI programs on an energy-constrained, power-scalable cluster , 2006, PPoPP '06.

[30]  Bruce Jacob,et al.  A control-theoretic approach to dynamic voltage scheduling , 2003, CASES '03.

[31]  David K. Lowenthal,et al.  Using multiple energy gears in MPI programs on a power-scalable cluster , 2005, PPoPP.

[32]  Rong Ge,et al.  Performance-constrained Distributed DVS Scheduling for Scientific Applications on Power-aware Clusters , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[33]  Sally A. McKee,et al.  Methods of inference and learning for performance modeling of parallel applications , 2007, PPoPP.

[34]  Margaret Martonosi,et al.  Dynamic-Compiler-Driven Control for Microprocessor Energy and Performance , 2006, IEEE Micro.

[35]  Ricardo Bianchini,et al.  Conserving disk energy in network servers , 2003, ICS '03.

[36]  Mahmut T. Kandemir,et al.  Exploiting barriers to optimize power consumption of CMPs , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[37]  Hiroshi Nakamura,et al.  An intra-task dvfs technique based on statistical analysis of hardware events , 2007, CF '07.

[38]  Jaejin Lee,et al.  Adaptive execution techniques for SMT multiprocessor architectures , 2005, PPOPP.