Practical Run-time Adaptation with Procedure Cloning to Enable Continuous Collective Compilation

Iterative feedback-directed optimization is now a popular technique to obtain better performance and code size improvements for statically compiled programs over the default settings in a compiler. The offline evaluation of multiple optimization strategies for a given program is a potentially costly operation. The number of iterations typically grows with the complexity of the program transformation search space, and with the number of input datasets used for performance assessment. In addition, as the behavior of a program can vary considerably across different datasets, it is often preferable to generate different optimization versions, covering the full spectrum of the program’s representative datasets. Continuous and collective optimization are targeted at these issues. Continuous optimization searches for the best program transformation at run-time, taking advantages of the phase behavior of programs to evaluate multiple optimization versions within a single run, and dynamically adapting to changing execution contexts. Collective optimization interleaves optimization iterations with program executions along the lifetime of the program. In both cases, the user expects the optimization process to learn from the past execution contexts and program behavior. The user also assumes the system will be fully transparent, take negligible overhead for the incremental profiling, learning, decision and code generation steps, while bringing significant performance benefits over the lifetime of the program. In order to explore multiple optimization options, we propose a simple and practical solution based on cloning of all procedures, applying any complex optimizations to these clones and randomly selecting either original or transformed procedures at run-time. Obtaining execution time distribution among original and cloned procedures, we can statistically determine the influence of compiler optimizations on the code in a single run. The simplicity of the implementation makes this technique reliable, secure and easy to debug. Yet it enables practical transparent low-overhead continuous optimizations for programs statically compiled with GCC while avoiding complex dynamic recompilation frameworks. In addition, our framework can enable program selfadaptation at fine-grain level for different environments such as parallel heterogeneous and

[1]  Mark Stephenson,et al.  Predicting unroll factors using supervised classification , 2005, International Symposium on Code Generation and Optimization.

[2]  Grigori Fursin,et al.  Probabilistic source-level optimisation of embedded programs , 2005, LCTES '05.

[3]  Mary Lou Soffa,et al.  A model-based framework: an approach for profit-driven optimization , 2005, International Symposium on Code Generation and Optimization.

[4]  Keith D. Cooper,et al.  ACME: adaptive compilation made efficient , 2005, LCTES '05.

[5]  Wei-Chung Hsu,et al.  Design and Implementation of a Lightweight Dynamic Optimization System , 2004, J. Instr. Level Parallelism.

[6]  Michael F. P. O'Boyle,et al.  MiDataSets: Creating the Conditions for a More Realistic Evaluation of Iterative Optimization , 2007, HiPEAC.

[7]  David I. August,et al.  Compiler optimization-space exploration , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..

[8]  Peter M. W. Knijnenburg,et al.  Iterative compilation in a non-linear optimisation space , 1998 .

[9]  Peter M. W. Knijnenburg,et al.  Statistical selection of compiler options , 2004, The IEEE Computer Society's 12th Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems, 2004. (MASCOTS 2004). Proceedings..

[10]  Albert Cohen,et al.  Building a Practical Iterative Interactive Compiler , 2007 .

[11]  Michael Voss,et al.  High-level adaptive program optimization with ADAPT , 2001, PPoPP '01.

[12]  T. Kisuki,et al.  Iterative Compilation in Program Optimization , 2000 .

[13]  Manuela M. Veloso,et al.  Learning to Predict Performance from Formula Modeling and Training Data , 2000, ICML.

[14]  Jack J. Dongarra,et al.  Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[15]  Yunheung Paek,et al.  Finding effective optimization phase sequences , 2003, LCTES '03.

[16]  Peter M. W. Knijnenburg,et al.  Generating new general compiler optimization settings , 2005, ICS '05.

[17]  Steven G. Johnson,et al.  FFTW: an adaptive software architecture for the FFT , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[18]  François Bodin,et al.  A Machine Learning Approach to Automatic Production of Compiler Heuristics , 2002, AIMSA.

[19]  Saman P. Amarasinghe,et al.  Meta optimization: improving compiler heuristics with machine learning , 2003, PLDI '03.

[20]  Albert Cohen,et al.  A Practical Method for Quickly Evaluating Program Optimizations , 2005, HiPEAC.

[21]  Gary S. Tyson,et al.  Evaluating Heuristic Optimization Phase Order Search Algorithms , 2007, International Symposium on Code Generation and Optimization (CGO'07).

[22]  Keith D. Cooper,et al.  Optimizing for reduced code space using genetic algorithms , 1999, LCTES '99.

[23]  Grigori Fursin,et al.  Iterative compilation and performance prediction for numerical applications , 2004 .

[24]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[25]  Rudolf Eigenmann,et al.  Rating Compiler Optimizations for Automatic Performance Tuning , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[26]  Keith D. Cooper,et al.  Adaptive Optimizing Compilers for the 21st Century , 2002, The Journal of Supercomputing.

[27]  Peter M. W. Knijnenburg,et al.  On the impact of data input sets on statistical compiler tuning , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[28]  Michael F. P. O'Boyle,et al.  Rapidly Selecting Good Compiler Optimizations using Performance Counters , 2007, International Symposium on Code Generation and Optimization (CGO'07).

[29]  John Cavazos,et al.  Inducing heuristics to decide whether to schedule , 2004, PLDI '04.

[30]  Rudolf Eigenmann,et al.  Fast and effective orchestration of compiler optimizations for automatic performance tuning , 2006, International Symposium on Code Generation and Optimization (CGO'06).

[31]  Michael F. P. O'Boyle,et al.  Evaluating Iterative Compilation , 2002, LCPC.

[32]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[33]  Rudolf Eigenmann,et al.  Fast, automatic, procedure-level performance tuning , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[34]  Michael F. P. O'Boyle,et al.  Using machine learning to focus iterative optimization , 2006, International Symposium on Code Generation and Optimization (CGO'06).

[35]  Michael Voss,et al.  ADAPT: Automated De-coupled Adaptive Program Transformation , 2000, Proceedings 2000 International Conference on Parallel Processing.

[36]  Wei-Chung Hsu,et al.  Continuous Adaptive Object-Code Re-optimization Framework , 2004, Asia-Pacific Computer Systems Architecture Conference.

[37]  Mark Stephenson,et al.  Automating the construction of compiler heuristics using machine learning , 2006 .

[38]  Brad Calder,et al.  Online performance auditing: using hot optimizations without getting burned , 2006, PLDI '06.