Characterizing the Impact of Program Optimizations on Power and Energy for Explicit Hydrodynamics

With the end of Denard scaling, future systems will be constrained by power and energy. This will impact application developers by forcing them to restructure and optimize their algorithms in terms of these resources. In this paper, we analyze the impact of different code optimizations on power, energy, and execution time. Our optimizations include loop fusion, data structure transformations, global allocation, and compiler selection. We analyze the static and dynamic components of power and energy as applied to the processor chip and memory domains within a system. In addition, our analysis correlates energy and power changes with performance events and shows that data motion is highly correlated with memory power and energy usage and executed instructions are partially correlated with processor power and energy. Our results demonstrate key tradeoffs among power, energy, and execution time for explicit hydrodynamics via a representative kernel. In particular, we observe that loop fusion and compiler selection improve all objectives, while global allocation and data layout transformations present tradeoffs that are objective-dependent.

[1]  John Cavazos,et al.  Energy Auto-Tuning using the Polyhedral Approach , 2014 .

[2]  Margaret Martonosi,et al.  Runtime power monitoring in high-end processors: methodology and empirical data , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[3]  Eduard Ayguadé,et al.  Decomposable and responsive power models for multicore processors using performance counters , 2010, ICS '10.

[4]  Jeffrey T. Draper,et al.  Leakage energy estimates for HPC applications , 2013, E2SC '13.

[5]  Martin Schulz,et al.  Exploring Traditional and Emerging Parallel Programming Models Using a Proxy Application , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[6]  Ian Karlin,et al.  User-Specified and Automatic Data Layout Selection for Portable Performance , 2013 .

[7]  Ian Karlin,et al.  Poster: Memory and Parallelism Exploration Using the LULESH Proxy Application , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.

[8]  Chenjie Yu,et al.  Evaluating Power-Monitoring Capabilities on IBM Blue Gene/P and Blue Gene/Q , 2012, 2012 IEEE International Conference on Cluster Computing.

[9]  Pradip Bose,et al.  Application-level power and performance characterization and optimization on IBM Blue Gene/Q systems , 2013, IBM J. Res. Dev..

[10]  Gokcen Kestor,et al.  Enabling accurate power profiling of HPC applications on exascale systems , 2013, ROSS '13.

[11]  Ibm Redbooks,et al.  IBM System Blue Gene Solution: Blue Gene/P Application Development , 2009 .

[12]  Elizabeth R. Jessup,et al.  Modeling the memory and performance impacts of loop fusion , 2012, J. Comput. Sci..

[13]  V. Sarkar,et al.  Collective Loop Fusion for Array Contraction , 1992, LCPC.

[14]  Zhiling Lan,et al.  Measuring Power Consumption on IBM Blue Gene/Q , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.

[15]  Shuaiwen Song,et al.  Unified performance and power modeling of scientific workloads , 2013, E2SC '13.

[16]  Rahul Khanna,et al.  RAPL: Memory power estimation and capping , 2010, 2010 ACM/IEEE International Symposium on Low-Power Electronics and Design (ISLPED).

[17]  Ananta Tiwari,et al.  Modeling Power and Energy Usage of HPC Kernels , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[18]  Prasanna Balaprakash,et al.  Multi Objective Optimization of HPC Kernels for Performance, Power, and Energy , 2013, PMBS@SC.