Energy Auto-Tuning using the Polyhedral Approach

As the HPC community moves into the exascale computing era, application energy has become a big concern. Tuning for energy will be essential in the eort to overcome the limited power envelope. How is tuning for lower energy related to tuning for faster execution? Understanding that relationship can guide both performance and energy tuning for exascale. In this paper, a strong correlation is presented between the two that allows tuning for execution to be used as a proxy for energy tuning. We also show that polyhedral compilers can eectively tune a realistic application for both time and energy. For a large number of variants of the Polybench programs and LULESH energy consumption is strongly correlated with total execution time. Optimizations can increase the power and energy required between variants, but the variant with minimum execution time also has the lowest energy usage. The polyhedral framework was also used to optimize a 2D cardiac wave propagation simulation application. Various loop optimizations including fusion, tiling, vectorization, and auto-parallelization, achieved a 20% speedup over the baseline OpenMP implementation, with an equivalent reduction in energy on an Intel Sandy Bridge system. On an Intel Xeon Phi system, improvements as high as 21% in execution time and 19% reduction in energy are obtained.

[1]  Paul Feautrier,et al.  Some efficient solutions to the affine scheduling problem. I. One-dimensional time , 1992, International Journal of Parallel Programming.

[2]  Paul Feautrier,et al.  Some efficient solutions to the affine scheduling problem. Part II. Multidimensional time , 1992, International Journal of Parallel Programming.

[3]  No License,et al.  Intel ® 64 and IA-32 Architectures Software Developer ’ s Manual Volume 3 A : System Programming Guide , Part 1 , 2006 .

[4]  David Parello,et al.  Semi-Automatic Composition of Loop Transformations for Deep Parallelism and Memory Hierarchies , 2006, International Journal of Parallel Programming.

[5]  Albert Cohen,et al.  Polyhedral Code Generation in the Real World , 2006, CC.

[6]  Uday Bondhugula,et al.  A practical automatic polyhedral parallelizer and locality optimizer , 2008, PLDI '08.

[7]  Rami G. Melhem,et al.  On the Interplay of Parallelization, Program Performance, and Energy Consumption , 2010, IEEE Transactions on Parallel and Distributed Systems.

[8]  Albert Cohen,et al.  The Polyhedral Model Is More Widely Applicable Than You Think , 2010, CC.

[9]  Cédric Bastoul,et al.  Predictive Modeling in a Polyhedral Optimization Space , 2011, International Symposium on Code Generation and Optimization (CGO 2011).

[10]  Ananta Tiwari,et al.  Auto-tuning for Energy Usage in Scientific Applications , 2011, Euro-Par Workshops.

[11]  H. Howie Huang,et al.  GPGPU accelerated cardiac arrhythmia simulations , 2011, 2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[12]  Hermann Härtig,et al.  Measuring energy consumption for short code paths using RAPL , 2012, PERV.

[13]  Ian Karlin,et al.  LULESH Programming Model and Performance Ports Overview , 2012 .

[14]  Jichi Guo,et al.  Studying the impact of application-level optimizations on the power consumption of multi-core architectures , 2012, CF '12.

[15]  Lifan Xu,et al.  Auto-tuning a high-level language targeted to GPU codes , 2012, 2012 Innovative Parallel Computing (InPar).

[16]  John Cavazos,et al.  Using graph-based program characterization for predictive modeling , 2012, CGO '12.

[17]  P. Sadayappan,et al.  A Compiler Analysis to Determine Useful Cache Size for Energy Efficiency , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.

[18]  Tomofumi Yuki,et al.  Folklore Confirmed: Compiling for Speed = Compiling for Energy , 2013, LCPC.

[19]  J. Ramanujam,et al.  Parametric GPU Code Generation for Affine Loop Programs , 2013, LCPC.

[20]  Guang R. Gao,et al.  Strategies for improving performance and energy efficiency on a many-core , 2013, CF '13.

[21]  Robert J. Fowler,et al.  OpenMP and MPI application energy measurement variation , 2013, E2SC '13.

[22]  Martin Schulz,et al.  Exploring Traditional and Emerging Parallel Programming Models Using a Proxy Application , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.