ATF: A generic directive‐based auto‐tuning framework

We describe the Auto‐Tuning Framework (ATF) — a simple‐to‐use, generic approach and its implementation, as a framework for automatic program optimization by choosing the most suitable values of program parameters such as the number of parallel threads, tile sizes, etc. ATF combines four major advantages over the state‐of‐the‐art auto‐tuning: i) it is generic regarding the programming language, application domain, tuning objective (eg, high performance and/or low energy consumption), and search technique; ii) it can auto‐tune a broader class of applications by allowing tuning parameters to be interdependent, eg, when one parameter is divisible by another parameter; iii) it allows tuning parameters to have substantially larger ranges by implementing an optimized search space generation process; and iv) it is arguably simpler to use, eg, the ATF user prepares an application for auto‐tuning by annotating its source code with simple tuning directives. We demonstrate ATF's efficacy by comparing it to the state‐of‐the‐art auto‐tuning approaches, OpenTuner and CLTune; ATF shows better tuning results with less programmer's effort.

[1]  I-Hsin Chung,et al.  Active Harmony: Towards Automated Performance Tuning , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[2]  Mary W. Hall,et al.  CHiLL : A Framework for Composing High-Level Loop Transformations , 2007 .

[3]  Helmar Burkhart,et al.  PATUS: A Code Generation and Autotuning Framework for Parallel Iterative Stencil Computations on Modern Microarchitectures , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[4]  Michael F. P. O'Boyle,et al.  Milepost GCC: Machine Learning Enabled Self-tuning Compiler , 2011, International Journal of Parallel Programming.

[5]  Cedric Nugteren,et al.  CLTune: A Generic Auto-Tuner for OpenCL Kernels , 2015, 2015 IEEE 9th International Symposium on Embedded Multicore/Many-core Systems-on-Chip.

[6]  Alfredo Goldman,et al.  Autotuning CUDA compiler parameters for heterogeneous applications using the OpenTuner framework , 2017, Concurr. Comput. Pract. Exp..

[7]  P. Sadayappan,et al.  Annotation-based empirical performance tuning using Orio , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[8]  Shoaib Kamil,et al.  OpenTuner: An extensible framework for program autotuning , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).

[9]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[10]  Jack J. Dongarra,et al.  Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[11]  Cedric Nugteren,et al.  CLBlast: A Tuned OpenCL BLAS Library , 2017, IWOCL.

[12]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[13]  Ignacio Laguna,et al.  Apollo: Reusable Models for Fast, Dynamic Tuning of Input-Dependent Code , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).