Power-aware predictive models of hybrid (MPI/OpenMP) scientific applications on multicore systems

Predictive models enable a better understanding of the performance characteristics of applications on multicore systems. Previous work has utilized performance counters in a system-centered approach to model power consumption for the system, CPU, and memory components. Often, these approaches use the same group of counters across different applications. In contrast, we develop application-centric models (based upon performance counters) for the runtime and power consumption of the system, CPU, and memory components. Our work analyzes four Hybrid (MPI/OpenMP) applications: the NAS Parallel Multizone Benchmarks (BT-MZ, SP-MZ, LU-MZ) and a Gyrokinetic Toroidal Code, GTC. Our models show that cache utilization (L1/L2), branch instructions, TLB data misses, and system resource stalls affect the performance of each application and performance component differently. We show that the L2 total cache hits counter affects performance across all applications. The models are validated for the system and component power measurements with an error rate less than 3%.

[1]  Frank Bellosa,et al.  The benefits of event: driven energy accounting in power-sensitive systems , 2000, ACM SIGOPS European Workshop.

[2]  Xingfu Wu,et al.  Prophesy: an infrastructure for performance analysis and modeling of parallel and grid applications , 2003, PERV.

[3]  Lizy Kurian John,et al.  Run-time modeling and estimation of operating system power consumption , 2003, SIGMETRICS '03.

[4]  Haoqiang Jin,et al.  Performance characteristics of the multi-zone NAS parallel benchmarks , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[5]  Haoqiang Jin,et al.  Performance Characteristics of the Multi-Zone NAS Parallel Benchmarks , 2004, IPDPS.

[6]  Lizy Kurian John,et al.  Runtime identification of microprocessor energy saving opportunities , 2005, ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005..

[7]  David K. Lowenthal,et al.  Just In Time Dynamic Voltage Scaling: Exploiting Inter-Node Slack to Save Energy in MPI Programs , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[8]  Wu-chun Feng,et al.  A Power-Aware Run-Time System for High-Performance Computing , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[9]  Kevin Skadron,et al.  Using performance counters for runtime temperature sensing in high-performance processors , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[10]  Feng Pan,et al.  Exploring the energy-time tradeoff in MPI programs on a power-scalable cluster , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[11]  David K. Lowenthal,et al.  Using multiple energy gears in MPI programs on a power-scalable cluster , 2005, PPoPP.

[12]  Xingfu Wu,et al.  Performance Analysis, Modeling and Prediction of a Parallel Multiblock Lattice Boltzmann Application Using Prophesy System , 2006, 2006 IEEE International Conference on Cluster Computing.

[13]  Dimitrios S. Nikolopoulos,et al.  Online power-performance adaptation of multithreaded programs using hardware event-based prediction , 2006, ICS '06.

[14]  R. Tibshirani,et al.  Prediction by Supervised Principal Components , 2006 .

[15]  Lizy Kurian John,et al.  Complete System Power Estimation: A Trickle-Down Approach Based on Performance Events , 2007, 2007 IEEE International Symposium on Performance Analysis of Systems & Software.

[16]  Dimitrios S. Nikolopoulos,et al.  Prediction-Based Power-Performance Adaptation of Multithreaded Scientific Codes , 2008, IEEE Transactions on Parallel and Distributed Systems.

[17]  Xingfu Wu,et al.  Performance Analysis and Optimization of Parallel Scientific Applications on CMP Cluster Systems , 2008, 2008 International Conference on Parallel Processing - Workshops.

[18]  Sadaf R. Alam,et al.  A Methodology for Developing High Fidelity Communication Models for Large-Scale Applications Targeted on Multicore Systems , 2008, 2008 20th International Symposium on Computer Architecture and High Performance Computing.

[19]  Xingfu Wu,et al.  Performance Analysis and Optimization of Parallel Scientific Applications on CMP Clusters , 2009, Scalable Comput. Pract. Exp..

[20]  Bronis R. de Supinski,et al.  Adagio: making DVS practical for complex HPC applications , 2009, ICS.

[21]  Shuaiwen Song,et al.  Energy Profiling and Analysis of the HPC Challenge Benchmarks , 2009, Int. J. High Perform. Comput. Appl..

[22]  Sally A. McKee,et al.  Real time power estimation and thread scheduling via performance counters , 2009, CARN.

[23]  Robert J. Fowler,et al.  SoftPower: fine-grain power estimations using performance counters , 2010, HPDC '10.

[24]  Dong Li,et al.  PowerPack: Energy Profiling and Analysis of High-Performance Systems and Applications , 2010, IEEE Transactions on Parallel and Distributed Systems.

[25]  Dong Li,et al.  Hybrid MPI/OpenMP power-aware computing , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[26]  Dong Li,et al.  Power saving experiments for large-scale global optimisation , 2010, Int. J. Parallel Emergent Distributed Syst..

[27]  Xingfu Wu,et al.  Performance characteristics of hybrid MPI/OpenMP implementations of NAS parallel benchmarks SP and BT on large-scale multicore supercomputers , 2011, PERV.

[28]  Shuaiwen Song,et al.  Iso-Energy-Efficiency: An Approach to Power-Constrained Parallel Computation , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.