Interactive Composition of Compiler Optimizations

Conventional compilers provide limited external control over the optimizations they automatically apply to attain high performance. Consequently, these optimizations have become increasingly ineffective due to the difficulty of understanding the higher-level semantics of the user applications. This paper presents a framework that provides interactive fine-grained control of compiler optimizations to external users as part of an integrated program development environment. Through a source-level optimization specification language and a Graphical User Interface GUI, users can interactively select regions within their source code as targets of optimization and then explicitly compose and configure how each optimization should be applied to maximize performance. The optimization specifications can then be downloaded and fed into a backend transformation engine, which empirically tunes the optimization configurations on varying architectures. When used to optimize a collection of matrix and stencil kernels, our framework was able to attain 1.84X/3.83X speedup on average compared with using icc/gcc alone.

[1]  Qing Yi,et al.  POET: a scripting language for applying parameterized source‐to‐source program transformations , 2012, Softw. Pract. Exp..

[2]  Qian Wang,et al.  AUGEM: Automatically generate high performance Dense Linear Algebra kernels on x86 CPUs , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[3]  Michael F. P. O'Boyle,et al.  Feedback Assisted Iterative Compilation , 2000 .

[4]  Qing Yi,et al.  Automated programmable control and parameterization of compiler optimizations , 2011, International Symposium on Code Generation and Optimization (CGO 2011).

[5]  Qian Wang,et al.  Specializing Compiler Optimizations through Programmable Composition for Dense Matrix Computations , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[6]  Mark Stephenson,et al.  Predicting unroll factors using supervised classification , 2005, International Symposium on Code Generation and Optimization.

[7]  Michael E. Wolf,et al.  The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.

[8]  Rudolf Eigenmann,et al.  Fast, automatic, procedure-level performance tuning , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[9]  Gang Ren,et al.  A comparison of empirical and model-driven optimization , 2003, PLDI '03.

[10]  Richard W. Vuduc,et al.  POET: Parameterized Optimizations for Empirical Tuning , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[11]  Chun Chen,et al.  Loop Transformation Recipes for Code Generation and Auto-Tuning , 2009, LCPC.

[12]  Michael Voss,et al.  High-level adaptive program optimization with ADAPT , 2001, PPoPP '01.

[13]  Chun Chen,et al.  Combining models and guided empirical search to optimize for multiple levels of the memory hierarchy , 2005, International Symposium on Code Generation and Optimization.

[14]  R. C. Whaley,et al.  Automated transformation for performance-critical kernels , 2007, LCSD '07.

[15]  Chau-Wen Tseng,et al.  Improving data locality with loop transformations , 1996, TOPL.

[16]  Michael Wolfe,et al.  More iteration space tiling , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).

[17]  David Parello,et al.  Facilitating the search for compositions of program transformations , 2005, ICS '05.

[18]  Jichi Guo,et al.  Extensive Parameterization And Tuning of Architecture-Sensitive Optimizations , 2011, ICCS.

[19]  Paul N. Hilfinger,et al.  Better Tiling and Array Contraction for Compiling Scientific Programs , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[20]  Ken Kennedy,et al.  Automatic tuning of whole applications using direct search and a performance-based transformation system , 2006, The Journal of Supercomputing.

[21]  David A. Padua,et al.  A Language for the Compact Representation of Multiple Program Versions , 2005, LCPC.

[22]  Ken Kennedy,et al.  Improving the ratio of memory operations to floating-point operations in loops , 1994, TOPL.