Multi-Objective Auto-Tuning with Insieme: Optimization and Trade-Off Analysis for Time, Energy and Resource Usage

The increasing complexity of modern multi- and many-core hardware design makes performance tuning of parallel applications a difficult task. In the past, auto-tuners have been successfully applied to minimize execution time. However, besides execution time, additional optimization goals have recently arisen, such as energy consumption or computing costs. Therefore, more sophisticated methods capable of exploiting and identifying the trade-offs among these goals are required. In this work we present and discuss results of applying a multi-objective search-based auto-tuner to optimize for three conflicting criteria: execution time, energy consumption, and resource usage. We examine a method, called RS-GDE3, to tune HPC codes using the Insieme parallelizing and optimizing compiler. Our results demonstrate that RS-GDE3 offers solutions of superior quality than those provided by a hierarchical and a random search at a fraction of the required time (5%) or energy (8%). A comparison to a state-of-the-art multi-objective optimizer (NSGA-II) shows that RS-GDE3 computes solutions of higher quality. Finally, based on the trade-off solutions found by RS-GDE3, we provide a detailed analysis and several hints on how to improve the design of multi-objective auto-tuners and code optimization.

[1]  Hermann Härtig,et al.  Measuring energy consumption for short code paths using RAPL , 2012, PERV.

[2]  Feng Pan,et al.  Analyzing the Energy-Time Trade-Off in High-Performance Computing Applications , 2007, IEEE Transactions on Parallel and Distributed Systems.

[3]  I-Hsin Chung,et al.  Active Harmony: Towards Automated Performance Tuning , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[4]  David K. Lowenthal,et al.  Using multiple energy gears in MPI programs on a power-scalable cluster , 2005, PPoPP.

[5]  Lothar Thiele,et al.  Multiobjective evolutionary algorithms: a comparative case study and the strength Pareto approach , 1999, IEEE Trans. Evol. Comput..

[6]  Randy H. Katz,et al.  Above the Clouds: A Berkeley View of Cloud Computing , 2009 .

[7]  Kirk W. Cameron,et al.  Energy and performance characteristics of different parallel implementations of scientific applications on multicore systems , 2011, Int. J. High Perform. Comput. Appl..

[8]  M.J. Flynn,et al.  Deep submicron microprocessor design issues , 1999, IEEE Micro.

[9]  Ananta Tiwari,et al.  Auto-tuning for Energy Usage in Scientific Applications , 2011, Euro-Par Workshops.

[10]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[11]  Dong Li,et al.  Strategies for Energy-Efficient Resource Management of Hybrid Programming Models , 2013, IEEE Transactions on Parallel and Distributed Systems.

[12]  Jichi Guo,et al.  Studying the impact of application-level optimizations on the power consumption of multi-core architectures , 2012, CF '12.

[13]  Lieven Eeckhout,et al.  Cole: compiler optimization level exploration , 2008, CGO '08.

[14]  Xuejun Yang,et al.  Energy-Oriented OpenMP Parallel Loop Scheduling , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing with Applications.

[15]  Thomas Fahringer,et al.  A multi-objective auto-tuning framework for parallel codes , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[16]  Prasanna Balaprakash,et al.  Multi Objective Optimization of HPC Kernels for Performance, Power, and Energy , 2013, PMBS@SC.