Refactoring for performance optimization and porting to new architectures has become particularly challenging in the accelerator and offload computing era, where CSE software must be adapted by offloading processing to highly-parallel, exotic computational devices. The transformations entail gory details, often affecting the organization of data structures, the layout of loop nests and various code pieces. Data structures must be optimized for vectorized operations, fully or partially duplicated and moved from host to accelerator device. Loops often have to have their iteration space rearranged (tiling, fusion, etc.) and shared between host and accelerator devices. Code needs to be moved around; extracted, synchronized or repackaged for testing; specialized for optimization; generalized for reuse and cloned for cross-compilation. Specifically, it is desirable to streamline the process of identifying and transforming computational kernels, as uninterruptedly to the code base as possible. This, we argue, requires new compilation abstractions that can be defined and directed by the CSE developer to complement the compilation toolchain. Many tools, such as polyhedral compiler technology, stencil optimization toolkits, etc., expect the user to pre-process, isolate, or otherwise “normalize” the source code before feeding it to them. While these tools are backed by formal theory, and thus validating, the whole task can be intense and even impossible without removing offending code. Our collective experience supporting computational scientists with in-house and vendor-requested tools shows that the developer wants to transform sources with the least possible effort in order to explore an optimization path that they have worked out on paper, because they simply want to try out an idea. Productivity is not confined to just being able to describe and have a system that implements a custom refactorization
[1]
Xiong Xiao,et al.
An Approach to Customization of Compiler Directives for Application-Specific Code Transformations
,
2014,
2014 IEEE 8th International Symposium on Embedded Multicore/Manycore SoCs.
[2]
Oscar R. Hernandez,et al.
HERCULES/PL: the pattern language of HERCULES
,
2014,
PLE@ECOOP.
[3]
John Cavazos,et al.
HSLOT: The HERCULES Scriptable Loop Transformations Engine
,
2014,
2014 Fourth International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing.
[4]
Dongrui Fan,et al.
Extendable pattern-oriented optimization directives
,
2011,
CGO 2011.
[5]
Oscar R. Hernandez,et al.
HERCULES: A Pattern Driven Code Transformation System
,
2012,
2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.