Unified performance and power modeling of scientific workloads

It is expected that scientific applications executing on future large-scale HPC must be optimized not only in terms of performance, but also in terms of power consumption. As power and energy become increasingly constrained resources, researchers and developers must have access to tools that will allow for accurate prediction of both performance and power consumption. Reasoning about performance and power consumption in concert will be critical for achieving maximum utilization of limited resources on future HPC systems. To this end, we present a unified performance and power model for the Nek-Bone mini-application developed as part of the DOE's CESAR Exascale Co-Design Center. Our models consider the impact of computation, point-to-point communication, and collective communication individually and quantitatively predict their impact on both performance and energy efficiency. Further, these models are demonstrated to be accurate on currently available HPC system architectures. In this paper, we present our modeling methodology and performance and power models for the Nek-Bone mini-application. We present validation results that indicate the accuracy of these models.

[1]  G. Johnson,et al.  A Performance Comparison Through Benchmarking and Modeling of Three Leading Supercomputers: Blue Gene/L, Red Storm, and Purple , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[2]  Laxmikant V. Kalé,et al.  BigSim: a parallel simulator for performance prediction of extremely large parallel machines , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[3]  Bruce Jacob,et al.  The structural simulation toolkit , 2006, PERV.

[4]  Shuaiwen Song,et al.  A Simplified and Accurate Model of Power-Performance Efficiency on Emergent GPU Architectures , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[5]  Rupak Biswas,et al.  Early performance evaluation of a "Nehalem" cluster using scientific and engineering applications , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[6]  Shuaiwen Song,et al.  An ISO-Energy-Efficient Approach to Scalable System Power-Performance Optimization , 2011, 2011 IEEE International Conference on Cluster Computing.

[7]  Mahmut T. Kandemir,et al.  Leakage Current: Moore's Law Meets Static Power , 2003, Computer.

[8]  George L.-T. Chiu,et al.  Tracking the Performance Evolution of Blue Gene Systems , 2013, ISC.

[9]  Philip Heidelberger,et al.  The IBM Blue Gene/Q interconnection network and message unit , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[10]  Jian Li,et al.  Dynamic power-performance adaptation of parallel computation on chip multiprocessors , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[11]  Shuaiwen Song,et al.  Iso-Energy-Efficiency: An Approach to Power-Constrained Parallel Computation , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[12]  DARREN J. KERBYSON,et al.  Performance Prediction via Modeling: a Case Study of the ORNL Cray XT4 Upgrade , 2009, Parallel Process. Lett..

[13]  Laxmikant V. Kalé,et al.  Understanding Application Performance via Micro-benchmarks on Three Large Supercomputers: Intrepid, Ranger and Jaguar , 2010, Int. J. High Perform. Comput. Appl..