Program optimizations: The interplay between power, performance, and energy

We provide an analysis of the power and energy effects of program optimizations.The analysis relies on per application phase and per system component studies.We provide guidance on tradeoffs when tuning for performance, power, and energy.We identify energy and runtime correlations for optimizations on three architectures.Multi-objective optimizations require per component and application phase analysis. Practical considerations for future supercomputer designs will impose limits on both instantaneous power consumption and total energy consumption. Working within these constraints while providing the maximum possible performance, application developers will need to optimize their code for speed alongside power and energy concerns. This paper analyzes the effectiveness of several code optimizations including loop fusion, data structure transformations, and global allocations. A per component measurement and analysis of different architectures is performed, enabling the examination of code optimizations on different compute subsystems. Using an explicit hydrodynamics proxy application from the U.S. Department of Energy, LULESH, we show how code optimizations impact different computational phases of the simulation. This provides insight for simulation developers into the best optimizations to use during particular simulation compute phases when optimizing code for future supercomputing platforms. We examine and contrast both x86 and Blue Gene architectures with respect to these optimizations.

[1]  Courtenay T. Vaughan,et al.  Energy based performance tuning for large scale high performance computing systems , 2012, HiPC 2012.

[2]  Ian Karlin,et al.  Characterizing the Impact of Program Optimizations on Power and Energy for Explicit Hydrodynamics , 2014, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops.

[3]  Ryan E. Grant,et al.  Optimizing Explicit Hydrodynamics for Power, Energy, and Performance , 2015, 2015 IEEE International Conference on Cluster Computing.

[4]  Gokcen Kestor,et al.  Enabling accurate power profiling of HPC applications on exascale systems , 2013, ROSS '13.

[5]  Ian Karlin,et al.  User-Specified and Automatic Data Layout Selection for Portable Performance , 2013 .

[6]  Ibm Redbooks IBM System Blue Gene Solution: Blue Gene/P Application Development , 2009 .

[7]  Ananta Tiwari,et al.  Modeling Power and Energy Usage of HPC Kernels , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[8]  John Cavazos,et al.  Energy Auto-Tuning using the Polyhedral Approach , 2014 .

[9]  Margaret Martonosi,et al.  Runtime Power Monitoring in High-End Processors: Methodology and Empirical Data , 2003, MICRO.

[10]  Elizabeth R. Jessup,et al.  Modeling the memory and performance impacts of loop fusion , 2012, J. Comput. Sci..

[11]  V. Sarkar,et al.  Collective Loop Fusion for Array Contraction , 1992, LCPC.

[12]  Prasanna Balaprakash,et al.  Multi Objective Optimization of HPC Kernels for Performance, Power, and Energy , 2013, PMBS@SC.

[13]  Daniel Sunderland,et al.  Manycore performance-portability: Kokkos multidimensional array library , 2012, Sci. Program..

[14]  Shuaiwen Song,et al.  Unified performance and power modeling of scientific workloads , 2013, E2SC '13.

[15]  Pradip Bose,et al.  Application-level power and performance characterization and optimization on IBM Blue Gene/Q systems , 2013, IBM J. Res. Dev..

[16]  Eduard Ayguadé,et al.  Decomposable and responsive power models for multicore processors using performance counters , 2010, ICS '10.

[17]  Jeffrey T. Draper,et al.  Leakage energy estimates for HPC applications , 2013, E2SC '13.

[18]  Martin Schulz,et al.  Exploring Traditional and Emerging Parallel Programming Models Using a Proxy Application , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[19]  B. Chamberlain,et al.  User-Defined Parallel Zippered Iterators in Chapel ∗ , 2011 .

[20]  I Karlin,et al.  Strong Scaling Bottleneck Identification and Mitigation in Ares , 2015 .

[21]  James H. Laros,et al.  PowerInsight - A commodity power measurement capability , 2013, 2013 International Green Computing Conference Proceedings.

[22]  Ian Karlin,et al.  Poster: Memory and Parallelism Exploration Using the LULESH Proxy Application , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.