Understanding Power and Energy Utilization in Large Scale Production Physics Simulation Codes

Power is an often-cited reason for moving to advanced architectures on the path to Exascale computing. This is due to the practical concern of delivering enough power to successfully site and operate these machines, as well as concerns over energy usage while running large simulations. Since accurate power measurements can be difficult to obtain, processor thermal design power (TDP) is a possible surrogate due to its simplicity and availability. However, TDP is not indicative of typical power usage while running simulations. Using commodity and advance technology systems at Lawrence Livermore National Laboratory (LLNL) and Sandia National Laboratory, we performed a series of experiments to measure power and energy usage in running simulation codes. These experiments indicate that large scale LLNL simulation codes are significantly more efficient than a simple processor TDP model might suggest.

[1]  Tirthak Patel,et al.  What does Power Consumption Behavior of HPC Jobs Reveal? : Demystifying, Quantifying, and Predicting Power Consumption Characteristics , 2020, 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[2]  Stephen L. Olivier,et al.  Standardizing Power Monitoring and Control at Exascale , 2016, Computer.

[3]  John E. Stone,et al.  Quantifying the impact of GPUs on performance and energy efficiency in HPC clusters , 2010, International Conference on Green Computing.

[4]  Danny C. Price,et al.  Optimizing performance-per-watt on GPUs in high performance computing , 2014, Computer Science - Research and Development.

[5]  J. A. Fleck,et al.  An implicit Monte Carlo scheme for calculating time and frequency dependent nonlinear radiation transport , 1971 .

[6]  D. A. Beckingsale,et al.  Umpire: Application-focused management and coordination of complex hierarchical memory , 2020, IBM J. Res. Dev..

[7]  Olga Pearce,et al.  RAJA: Portable Performance for Large-Scale Scientific Applications , 2019, 2019 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC).

[8]  R Rieben,et al.  The Multiphysics on Advanced Platforms Project , 2020 .

[9]  Stefano Zampini,et al.  MFEM: a modular finite element methods library , 2019, 1911.09220.

[10]  S. Copeland,et al.  Simulation and flow physics of a shocked and reshocked high-energy-density mixing layer , 2021, Journal of Fluid Mechanics.

[11]  Stephen L. Olivier,et al.  Enabling power measurement and control on Astra: The first petascale Arm supercomputer , 2022, Concurrency and Computation: Practice and Experience.

[12]  Tzanio V. Kolev,et al.  Matrix-free approaches for GPU acceleration of a high-order finite element hydrodynamics application using MFEM, Umpire, and RAJA , 2021, ArXiv.

[13]  John Shalf,et al.  Power efficiency in high performance computing , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.