A Region-Aware Multi-Objective Auto-Tuner for Parallel Programs

Auto-tuning has become increasingly popular for optimizing non-functional parameters of parallel programs. The typically large search space requires sophisticated techniques to find well performing parameter values in a reasonable amount of time. Different parts of a program often perform best with different parameter values. We therefore subdivide programs into several regions, and try to optimize the parameter values for each of those regions separately as opposed to setting the parameter values globally for the entire program. As this enlarges the search space even further, we have to extend existing auto-tuning techniques in order to obtain good results. In this paper we introduce a novelenhancement to the RS-GDE3 algorithm which is used to explore the search space for auto-tuning programs with multiple regions regarding several objectives. We have implemented our auto-tuner using the Insieme compiler and runtime system. In comparison to a non-optimized parallel version of the tested programs, our novel approach achieves up to 7.6, 10.5, and 61.6 fold improvements for three tuned objectives wall time, energy consumption, and resource usage, respectively.

[1]  Gary S. Tyson,et al.  Practical exhaustive optimization phase order exploration and evaluation , 2009, TACO.

[2]  William J. Dally,et al.  Sequoia: Programming the Memory Hierarchy , 2006, International Conference on Software Composition.

[3]  Matteo Frigo A Fast Fourier Transform Compiler , 1999, PLDI.

[4]  Xuejun Yang,et al.  Energy-Oriented OpenMP Parallel Loop Scheduling , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing with Applications.

[5]  Lieven Eeckhout,et al.  Cole: compiler optimization level exploration , 2008, CGO '08.

[6]  Carlos A. Coello Coello,et al.  A Study of Multiobjective Metaheuristics When Solving Parameter Scalable Problems , 2010, IEEE Transactions on Evolutionary Computation.

[7]  Thomas Fahringer,et al.  A multi-objective auto-tuning framework for parallel codes , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[8]  I-Hsin Chung,et al.  Active Harmony: Towards Automated Performance Tuning , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[9]  直野 健,et al.  Software Automatic Tuning, From Concepts to State-of-the-Art Results , 2010 .

[10]  Katherine Yelick,et al.  OSKI: A library of automatically tuned sparse matrix kernels , 2005 .

[11]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[12]  Lothar Thiele,et al.  Multiobjective evolutionary algorithms: a comparative case study and the strength Pareto approach , 1999, IEEE Trans. Evol. Comput..

[13]  Grigori Fursin,et al.  Finding representative sets of optimizations for adaptive multiversioning applications , 2009, ArXiv.

[14]  Prasanna Balaprakash,et al.  Multi Objective Optimization of HPC Kernels for Performance, Power, and Energy , 2013, PMBS@SC.

[15]  Franz Franchetti,et al.  SPIRAL: Code Generation for DSP Transforms , 2005, Proceedings of the IEEE.

[16]  Helmar Burkhart,et al.  PATUS: A Code Generation and Autotuning Framework for Parallel Iterative Stencil Computations on Modern Microarchitectures , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[17]  Joshua D. Knowles,et al.  Multiobjectivization by Decomposition of Scalar Cost Functions , 2008, PPSN.

[18]  Saman P. Amarasinghe PetaBricks: a language and compiler based on autotuning , 2011, HiPEAC.

[19]  Enrique Alba,et al.  Convergence speed in multi‐objective metaheuristics: Efficiency criteria and empirical study , 2010 .

[20]  P. Sadayappan,et al.  Neural Network Assisted Tile Size Selection , 2010 .

[21]  R. Storn,et al.  Differential Evolution - A simple and efficient adaptive scheme for global optimization over continuous spaces , 2004 .

[22]  Gary B. Lamont,et al.  Evolutionary Algorithms for Solving Multi-Objective Problems , 2002, Genetic Algorithms and Evolutionary Computation.

[23]  Jack J. Dongarra,et al.  Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[24]  David K. Lowenthal,et al.  Using multiple energy gears in MPI programs on a power-scalable cluster , 2005, PPoPP.

[25]  Thomas Fahringer,et al.  Multi-Objective Auto-Tuning with Insieme: Optimization and Trade-Off Analysis for Time, Energy and Resource Usage , 2014, Euro-Par.

[26]  Ananta Tiwari,et al.  Auto-tuning for Energy Usage in Scientific Applications , 2011, Euro-Par Workshops.

[27]  Courtenay T. Vaughan,et al.  Energy based performance tuning for large scale high performance computing systems , 2012, HiPC 2012.

[28]  Ananta Tiwari,et al.  PMaC's green queue: a framework for selecting energy optimal DVFS configurations in large scale MPI applications , 2016, Concurr. Comput. Pract. Exp..

[29]  David H. Bailey,et al.  The NAS Parallel Benchmarks 2.0 , 2015 .

[30]  Jichi Guo,et al.  Studying the impact of application-level optimizations on the power consumption of multi-core architectures , 2012, CF '12.

[31]  William Jalby,et al.  Fine-grained Benchmark Subsetting for System Selection , 2014, CGO '14.

[32]  Alan Edelman,et al.  PetaBricks: a language and compiler for algorithmic choice , 2009, PLDI '09.

[33]  David K. Lowenthal,et al.  Minimizing execution time in MPI programs on an energy-constrained, power-scalable cluster , 2006, PPoPP '06.

[34]  Shoaib Kamil,et al.  OpenTuner: An extensible framework for program autotuning , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).

[35]  Dong Li,et al.  Strategies for Energy-Efficient Resource Management of Hybrid Programming Models , 2013, IEEE Transactions on Parallel and Distributed Systems.

[36]  David Padua,et al.  Encyclopedia of Parallel Computing , 2011 .

[37]  Grigori Fursin,et al.  Collective Mind, Part II: Towards Performance- and Cost-Aware Software Engineering as a Natural Science , 2015, ArXiv.

[38]  Hermann Härtig,et al.  Measuring energy consumption for short code paths using RAPL , 2012, PERV.