A multi-objective auto-tuning framework for parallel codes

In this paper we introduce a multi-objective autotuning framework comprising compiler and runtime components. Focusing on individual code regions, our compiler uses a novel search technique to compute a set of optimal solutions, which are encoded into a multi-versioned executable. This enables the runtime system to choose specifically tuned code versions when dynamically adjusting to changing circumstances. We demonstrate our method by tuning loop tiling in cache-sensitive parallel programs, optimizing for both runtime and efficiency. Our static optimizer finds solutions matching or surpassing those determined by exhaustively sampling the search space on a regular grid, while using less than 4% of the computational effort on average. Additionally, we show that parallelism-aware multi-versioning approaches like our own gain a performance improvement of up to 70% over solutions tuned for only one specific number of threads.

[1]  Katherine Yelick,et al.  OSKI: A library of automatically tuned sparse matrix kernels , 2005 .

[2]  I-Hsin Chung,et al.  Active Harmony: Towards Automated Performance Tuning , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[3]  Cédric Bastoul,et al.  Code generation in the polyhedral model is easier than you think , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[4]  Lothar Thiele,et al.  Multiobjective evolutionary algorithms: a comparative case study and the strength Pareto approach , 1999, IEEE Trans. Evol. Comput..

[5]  Samuel Williams,et al.  PERI auto-tuning , 2008 .

[6]  Michael F. P. O'Boyle,et al.  Milepost GCC: Machine Learning Enabled Self-tuning Compiler , 2011, International Journal of Parallel Programming.

[7]  Mary W. Hall,et al.  CHiLL : A Framework for Composing High-Level Loop Transformations , 2007 .

[8]  Albert Cohen,et al.  Iterative Optimization in the Polyhedral Model: Part I, One-Dimensional Time , 2007, International Symposium on Code Generation and Optimization (CGO'07).

[9]  Lieven Eeckhout,et al.  Cole: compiler optimization level exploration , 2008, CGO '08.

[10]  Franz Franchetti,et al.  SPIRAL: Code Generation for DSP Transforms , 2005, Proceedings of the IEEE.

[11]  Helmar Burkhart,et al.  PATUS: A Code Generation and Autotuning Framework for Parallel Iterative Stencil Computations on Modern Microarchitectures , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[12]  Alan Edelman,et al.  PetaBricks: a language and compiler for algorithmic choice , 2009, PLDI '09.

[13]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[14]  直野 健,et al.  Software Automatic Tuning, From Concepts to State-of-the-Art Results , 2010 .

[15]  Jason Mars,et al.  Scenario Based Optimization: A Framework for Statically Enabling Online Optimizations , 2009, 2009 International Symposium on Code Generation and Optimization.

[16]  P. Hanrahan,et al.  Sequoia: Programming the Memory Hierarchy , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[17]  Vivek Sarkar,et al.  Analytical Bounds for Optimal Tile Size Selection , 2012, CC.

[18]  Gary B. Lamont,et al.  Evolutionary Algorithms for Solving Multi-Objective Problems (Genetic and Evolutionary Computation) , 2006 .

[19]  Chun Chen,et al.  A scalable auto-tuning framework for compiler optimization , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[20]  Enrique Alba,et al.  Convergence speed in multi‐objective metaheuristics: Efficiency criteria and empirical study , 2010 .

[21]  P. Sadayappan,et al.  Neural Network Assisted Tile Size Selection , 2010 .

[22]  Saman P. Amarasinghe PetaBricks: a language and compiler based on autotuning , 2011, HiPEAC.

[23]  Keith D. Cooper,et al.  Adaptive Optimizing Compilers for the 21st Century , 2002, The Journal of Supercomputing.

[24]  Jack J. Dongarra,et al.  Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[25]  J. Ramanujam,et al.  DynTile: Parametric tiled loop generation for parallel execution on multicore processors , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[26]  Samuel Williams,et al.  An auto-tuning framework for parallel multicore stencil computations , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[27]  Sriram Krishnamoorthy,et al.  Parametric multi-level tiling of imperfectly nested loops , 2009, ICS.

[28]  Jouni Lampinen,et al.  GDE3: the third evolution step of generalized differential evolution , 2005, 2005 IEEE Congress on Evolutionary Computation.

[29]  Rainer Storn,et al.  Differential Evolution – A Simple and Efficient Heuristic for global Optimization over Continuous Spaces , 1997, J. Glob. Optim..

[30]  Lieven Eeckhout,et al.  Automated just-in-time compiler tuning , 2010, CGO '10.

[31]  Xuan Chen,et al.  Adaptive Multi-versioning for OpenMP Parallelization via Machine Learning , 2009, 2009 15th International Conference on Parallel and Distributed Systems.

[32]  J. Ramanujam,et al.  Parameterized tiling revisited , 2010, CGO '10.

[33]  Carlos A. Coello Coello,et al.  A Micro-Genetic Algorithm for Multiobjective Optimization , 2001, EMO.

[34]  Albert Cohen,et al.  Iterative optimization in the polyhedral model: part ii, multidimensional time , 2008, PLDI '08.

[35]  Lothar Thiele,et al.  Multi-objective Exploration of Compiler Optimizations for Real-Time Systems , 2010, 2010 13th IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing.

[36]  Ananta Tiwari,et al.  Online Adaptive Code Generation and Tuning , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[37]  Sanjay V. Rajopadhye,et al.  Parameterized tiled loops for free , 2007, PLDI '07.

[38]  Carlos A. Coello Coello,et al.  DEMORS: A hybrid multi-objective optimization algorithm using differential evolution and rough set theory for constrained problems , 2010, Comput. Oper. Res..

[39]  Matteo Frigo,et al.  A fast Fourier transform compiler , 1999, SIGP.

[40]  Michael F. P. O'Boyle,et al.  Using machine learning to focus iterative optimization , 2006, International Symposium on Code Generation and Optimization (CGO'06).

[41]  Michael F. P. O'Boyle,et al.  Combined Selection of Tile Sizes and Unroll Factors Using Iterative Compilation , 2004, The Journal of Supercomputing.