Exploiting Historical Data: Pruning Autotuning Spaces and Estimating the Number of Tuning Steps

Autotuning, the practice of automatic tuning of code to provide performance portability, has received increased attention in the research community, especially in high performance computing. Ensuring high performance on a variety of hardware usually means modifications to the code, often via different values of a selected set of parameters, such as tiling size, loop unrolling factor or data layout. However, the search space of all possible combinations of these parameters can be enormous. Traditional search methods often fail to find a well-performing set of parameter values quickly.

[1]  Shoaib Kamil,et al.  OpenTuner: An extensible framework for program autotuning , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).

[2]  José María Carazo,et al.  A GPU acceleration of 3-D Fourier reconstruction in cryo-EM , 2019, Int. J. High Perform. Comput. Appl..

[3]  Prasanna Balaprakash,et al.  Autotuning in High-Performance Computing Applications , 2018, Proceedings of the IEEE.

[4]  Michael F. P. O'Boyle,et al.  Combined Selection of Tile Sizes and Unroll Factors Using Iterative Compilation , 2000, Proceedings 2000 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00622).

[5]  Jiri Filipovic,et al.  Autotuning of OpenCL Kernels with Global Optimizations , 2017, ANDARE '17.

[6]  Sergei Gorlatch,et al.  ATF: A generic directive‐based auto‐tuning framework , 2019, Concurr. Comput. Pract. Exp..

[7]  Michael Garland,et al.  Architecture-Adaptive Code Variant Tuning , 2016, ASPLOS.

[8]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[9]  Ben van Werkhoven,et al.  Kernel Tuner: A search-optimizing GPU code auto-tuner , 2019, Future Gener. Comput. Syst..

[10]  Chris Cummins,et al.  End-to-End Deep Learning of Optimization Heuristics , 2017, 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[11]  Prasanna Balaprakash,et al.  Exploiting Performance Portability in Search Algorithms for Autotuning , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[12]  Kevin Skadron,et al.  Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[13]  Cedric Nugteren,et al.  CLTune: A Generic Auto-Tuner for OpenCL Kernels , 2015, 2015 IEEE 9th International Symposium on Embedded Multicore/Many-core Systems-on-Chip.

[14]  Simon McIntosh-Smith,et al.  Improving Auto-Tuning Convergence Times with Dynamically Generated Predictive Performance Models , 2015, 2015 IEEE 9th International Symposium on Embedded Multicore/Many-core Systems-on-Chip.

[15]  Jack J. Dongarra,et al.  A comparison of search heuristics for empirical code optimization , 2008, 2008 IEEE International Conference on Cluster Computing.