Exploiting performance counters to predict and improve energy performance of HPC systems

Hardware monitoring through performance counters is available on almost all modern processors. Although these counters are originally designed for performance tuning, they have also been used for evaluating power consumption. We propose two approaches for modelling and understanding the behaviour of high performance computing (HPC) systems relying on hardware monitoring counters. We evaluate the effectiveness of our system modelling approach considering both optimising the energy usage of HPC systems and predicting HPC applications' energy consumption as target objectives. Although hardware monitoring counters are used for modelling the system, other methods -- including partial phase recognition and cross platform energy prediction -- are used for energy optimisation and prediction. Experimental results for energy prediction demonstrate that we can accurately predict the peak energy consumption of an application on a target platform; whereas, results for energy optimisation indicate that with no a priori knowledge of workloads sharing the platform we can save up to 24\% of the overall HPC system's energy consumption under benchmarks and real-life workloads.

[1]  Franck Cappello,et al.  Grid'5000: A Large Scale And Highly Reconfigurable Experimental Grid Testbed , 2006, Int. J. High Perform. Comput. Appl..

[2]  Margaret Martonosi,et al.  Live, Runtime Phase Monitoring and Prediction on Real Systems with Application to Dynamic Power Management , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[3]  Wei Wu,et al.  A systematic method for functional unit power estimation in microprocessors , 2006, 2006 43rd ACM/IEEE Design Automation Conference.

[4]  Margaret Martonosi,et al.  Run-time power estimation in high performance microprocessors , 2001, ISLPED '01.

[5]  José Duato,et al.  A simple power-aware scheduling for multicore systems when running real-time applications , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[6]  Jordan G. Powers,et al.  A Description of the Advanced Research WRF Version 2 , 2005 .

[7]  Mahmut T. Kandemir,et al.  Reducing power with performance constraints for parallel sparse applications , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[8]  David K. Lowenthal,et al.  Just In Time Dynamic Voltage Scaling: Exploiting Inter-Node Slack to Save Energy in MPI Programs , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[9]  Laurent Lefèvre,et al.  Beyond CPU Frequency Scaling for a Fine-grained Energy Control of HPC Systems , 2012, 2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing.

[10]  Margaret Martonosi,et al.  Identifying program power phase behavior using power vectors , 2003, 2003 IEEE International Conference on Communications (Cat. No.03CH37441).

[11]  D.K. Lowenthal,et al.  Adaptive, Transparent Frequency and Voltage Scaling of Communication Phases in MPI Programs , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[12]  Mahmut T. Kandemir,et al.  vEC: virtual energy counters , 2001, PASTE '01.

[13]  B. Tidor Molecular dynamics simulations , 1997, Current Biology.

[14]  Margaret Martonosi,et al.  Power prediction for Intel XScale/spl reg/ processors using performance monitoring unit events , 2005, ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005..

[15]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[16]  Helmut Hlavacs,et al.  Methodology of measurement for energy consumption of applications , 2010, 2010 11th IEEE/ACM International Conference on Grid Computing.

[17]  Hong Zhu,et al.  A survey of practical algorithms for suffix tree construction in external memory , 2010 .

[18]  Sally A. McKee,et al.  Real time power estimation and thread scheduling via performance counters , 2009, CARN.

[19]  Gilberto Contreras,et al.  Power prediction for Intel XScale processors using performance monitoring unit events , 2005 .

[20]  Mitsuhisa Sato,et al.  Emprical study on Reducing Energy of Parallel Programs using Slack Reclamation by DVFS in a Power-scalable High Performance Cluster , 2006, 2006 IEEE International Conference on Cluster Computing.

[21]  David K. Lowenthal,et al.  Just In Time Dynamic Voltage Scaling: Exploiting Inter-Node Slack to Save Energy in MPI Programs , 2005 .

[22]  J. Haile Molecular Dynamics Simulation , 1992 .

[23]  R. Kotla,et al.  Characterizing the impact of different memory-intensity levels , 2004, IEEE International Workshop on Workload Characterization, 2004. WWC-7. 2004.

[24]  Massoud Pedram,et al.  Fine-grained dynamic voltage and frequency scaling for precise energy and performance tradeoff based on the ratio of off-chip access to on-chip computation times , 2005 .

[25]  Steven J. Plimpton,et al.  Parallel genehunter: implementation of a linkage analysis package for distributed-memory architectures , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[26]  Bronis R. de Supinski,et al.  Adagio: making DVS practical for complex HPC applications , 2009, ICS.

[27]  Laurent Lefèvre,et al.  DNA-Inspired Scheme for Building the Energy Profile of HPC Systems , 2012, E2DC.