Hybrid MPI/OpenMP power-aware computing

Power-aware execution of parallel programs is now a primary concern in large-scale HPC environments. Prior research in this area has explored models and algorithms based on dynamic voltage and frequency scaling (DVFS) and dynamic concurrency throttling (DCT) to achieve power-aware execution of programs written in a single programming model, typically MPI or OpenMP. However, hybrid programming models combining MPI and OpenMP are growing in popularity as emerging large-scale systems have many nodes with several processors per node and multiple cores per process or. In th is paper we present and evaluate solutions for power-efficient execution of programs written in this hybrid model targeting large-scale distributed systems with multicore nodes. We use a new power-aware performance prediction model of hybrid MPI/OpenMP applications to derive a novel algorithm for power-efficient execution of realis tic applications from th e ASCS equoia and N PB MZ bench marks. Our new algorithm yields substantial energy savings (4.18% on average and up to 13.8%) with either negligible performance loss or performance gain (up to 7.2%).

[1]  David K. Lowenthal,et al.  Minimizing execution time in MPI programs on an energy-constrained, power-scalable cluster , 2006, PPoPP '06.

[2]  David K. Lowenthal,et al.  Using multiple energy gears in MPI programs on a power-scalable cluster , 2005, PPoPP.

[3]  David K. Lowenthal,et al.  Just In Time Dynamic Voltage Scaling: Exploiting Inter-Node Slack to Save Energy in MPI Programs , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[4]  Dimitrios S. Nikolopoulos,et al.  Online Power-Performance Adaptation of Multithreaded Programs using Event-Based Prediction , 2006 .

[5]  Dimitrios S. Nikolopoulos,et al.  Online power-performance adaptation of multithreaded programs using hardware event-based prediction , 2006, ICS '06.

[6]  Rong Ge,et al.  Performance-constrained Distributed DVS Scheduling for Scientific Applications on Power-aware Clusters , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[7]  Martin Schulz,et al.  Bounding energy consumption in large-scale MPI programs , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[8]  V. E. Henson,et al.  BoomerAMG: a parallel algebraic multigrid solver and preconditioner , 2002 .

[9]  Kevin Skadron,et al.  Multi-mode energy management for multi-tier server clusters , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[10]  Ragunathan Rajkumar,et al.  Critical power slope: understanding the runtime effects of frequency scaling , 2002, ICS '02.

[11]  Haoqiang Jin,et al.  Performance Characteristics of the Multi-Zone NAS Parallel Benchmarks , 2004, IPDPS.

[12]  Wu-chun Feng,et al.  A Power-Aware Run-Time System for High-Performance Computing , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[13]  Trevor N. Mudge,et al.  Power: A First-Class Architectural Design Constraint , 2001, Computer.

[14]  William Gropp,et al.  Reproducible Measurements of MPI Performance Characteristics , 1999, PVM/MPI.

[15]  William Gropp MPI and Hybrid Programming Models for Petascale Computing , 2008, PVM/MPI.

[16]  Gerhard Wellein,et al.  Communication and Optimization Aspects of Parallel Programming Models on Hybrid Architectures , 2003, Int. J. High Perform. Comput. Appl..

[17]  Bronis R. de Supinski,et al.  Prediction models for multi-dimensional power-performance optimization on many cores , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[18]  Bronis R. de Supinski,et al.  Adagio: making DVS practical for complex HPC applications , 2009, ICS.

[19]  Yale N. Patt,et al.  Feedback-driven threading: power-efficient and high-performance execution of multi-threaded workloads on CMPs , 2008, ASPLOS.

[20]  Bernd Mohr,et al.  Design and Prototype of a Performance Tool Interface for OpenMP , 2002, The Journal of Supercomputing.

[21]  Dimitrios S. Nikolopoulos,et al.  Prediction-Based Power-Performance Adaptation of Multithreaded Scientific Codes , 2008, IEEE Transactions on Parallel and Distributed Systems.

[22]  Margaret Martonosi,et al.  Runtime Power Monitoring in High-End Processors: Methodology and Empirical Data , 2003, MICRO.

[23]  Frank Bellosa,et al.  Task activity vectors: a new metric for temperature-aware scheduling , 2008, Eurosys '08.

[24]  Paolo Toth,et al.  Knapsack Problems: Algorithms and Computer Implementations , 1990 .

[25]  Haoqiang Jin,et al.  Performance characteristics of the multi-zone NAS parallel benchmarks , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..