Quick and Practical Run-Time Evaluation of Multiple Program Optimizations

This article aims at making iterative optimization practical and usable by speeding up the evaluation of a large range of optimizations. Instead of using a full run to evaluate a single program optimization, we take advantage of periods of stable performance, called phases. For that purpose, we propose a low-overhead phase detection scheme geared toward fast optimization space pruning, using code instrumentation and versioning implemented in a production compiler. Our approach is driven by simplicity and practicality. We show that a simple phase detection scheme can be sufficient for optimization space pruning. We also show it is possible to search for complex optimizations at run-time without resorting to sophisticated dynamic compilation frameworks. Beyond iterative optimization, our approach also enables one to quickly design self-tuned applications. Considering 5 representative SpecFP2000 benchmarks, our approach speeds up iterative search for the best program optimizations by a factor of 32 to 962. Phase prediction is 99.4% accurate on average, with an overhead of only 2.6%. The resulting self-tuned implementations bring an average speed-up of 1.4.

[1]  George Ho,et al.  PAPI: A Portable Interface to Hardware Performance Counters , 1999 .

[2]  Josep Llosa,et al.  Optimizing program locality through CMEs and GAs , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.

[3]  Brad Calder,et al.  Automatically characterizing large scale program behavior , 2002, ASPLOS X.

[4]  Wei-Chung Hsu,et al.  Design and Implementation of a Lightweight Dynamic Optimization System , 2004, J. Instr. Level Parallelism.

[5]  Brian N. Bershad,et al.  Fast, effective dynamic compilation , 1996, PLDI '96.

[6]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[7]  Lizy Kurian John,et al.  Effective adaptive computing environment management via dynamic optimization , 2005, International Symposium on Code Generation and Optimization.

[8]  Dawson R. Engler,et al.  C and tcc: a language and compiler for dynamic code generation , 1999, TOPL.

[9]  Walid Taha,et al.  Multi-Stage Programming: Its Theory and Applications , 1999 .

[10]  Daeyeon Park,et al.  Improving the effectiveness of software prefetching with adaptive executions , 1996, Proceedings of the 1996 Conference on Parallel Architectures and Compilation Technique.

[11]  KennedyKen,et al.  The impact of interprocedural analysis and optimization in the Rn programming environment , 1986 .

[12]  Chen Ding,et al.  Locality phase prediction , 2004, ASPLOS XI.

[13]  Sandhya Dwarkadas,et al.  Characterizing and predicting program behavior and its variability , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.

[14]  Grigori Fursin,et al.  Probabilistic source-level optimisation of embedded programs , 2005, LCTES '05.

[15]  Brad Calder,et al.  Transition phase classification and prediction , 2005, 11th International Symposium on High-Performance Computer Architecture.

[16]  Ken Kennedy,et al.  Procedure cloning , 1992, Proceedings of the 1992 International Conference on Computer Languages.

[17]  Ken Kennedy,et al.  The impact of interprocedural analysis and optimization in the Rn programming environment , 1986, TOPL.

[18]  Dawson R. Engler,et al.  VCODE: a retargetable, extensible, very fast dynamic code generation system , 1996, PLDI '96.

[19]  Walid Taha,et al.  Implementing Multi-stage Languages Using ASTs, Gensym, and Reflection , 2003, GPCE.

[20]  Jack J. Dongarra,et al.  Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[21]  John Domingue,et al.  Artificial Intelligence: Methodology, Systems, and Applications, 12th International Conference, AIMSA 2006, Varna, Bulgaria, September 12-15, 2006, Proceedings , 2006, AIMSA.

[22]  François Bodin,et al.  A Machine Learning Approach to Automatic Production of Compiler Heuristics , 2002, AIMSA.

[23]  David Parello,et al.  Towards a Systematic, Pragmatic and Architecture-Aware Program Optimization Process for Complex Processors , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[24]  Michael Voss,et al.  High-level adaptive program optimization with ADAPT , 2001, PPoPP '01.

[25]  David A. Padua,et al.  In search of a program generator to implement generic transformations for high-performance computing , 2006, Sci. Comput. Program..

[26]  James Demmel,et al.  Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology , 1997, ICS '97.

[27]  Peter M. W. Knijnenburg,et al.  Iterative compilation in a non-linear optimisation space , 1998 .

[28]  Martin C. Rinard,et al.  Dynamic feedback: an effective technique for adaptive computing , 1997, PLDI '97.

[29]  T. Kisuki,et al.  Iterative Compilation in Program Optimization , 2000 .

[30]  F. Bodin,et al.  Ufc : a Global Trade-oo Strategy for Loop Unrolling for Vliw Architecture , 2003 .

[31]  Keith D. Cooper,et al.  Adaptive Optimizing Compilers for the 21st Century , 2002, The Journal of Supercomputing.

[32]  David I. August,et al.  Compiler optimization-space exploration , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..

[33]  Brad Calder,et al.  Using SimPoint for accurate and efficient simulation , 2003, SIGMETRICS '03.

[34]  Michael F. P. O'Boyle,et al.  Evaluating Iterative Compilation , 2002, LCPC.

[35]  Michael Wolfe,et al.  Multiple Version Loops , 1987, ICPP.

[36]  David Parello,et al.  Facilitating the search for compositions of program transformations , 2005, ICS '05.

[37]  Jack J. Dongarra,et al.  A Portable Programming Interface for Performance Evaluation on Modern Processors , 2000, Int. J. High Perform. Comput. Appl..

[38]  Yunheung Paek,et al.  Advances in Computer Systems Architecture , 2008 .

[39]  David J. Lilja,et al.  Dynamic Code Region (DCR) Based Program Phase Tracking and Prediction for Dynamic Optimizations , 2005, HiPEAC.

[40]  Saman P. Amarasinghe,et al.  Meta optimization: improving compiler heuristics with machine learning , 2003, PLDI '03.

[41]  Mark Stephenson,et al.  Predicting unroll factors using supervised classification , 2005, International Symposium on Code Generation and Optimization.

[42]  Keith D. Cooper,et al.  Optimizing for reduced code space using genetic algorithms , 1999, LCTES '99.

[43]  Grigori Fursin,et al.  A heuristic search algorithm based on unified transformation framework , 2005, 2005 International Conference on Parallel Processing Workshops (ICPPW'05).

[44]  Michael Voss,et al.  ADAPT: Automated De-coupled Adaptive Program Transformation , 2000, Proceedings 2000 International Conference on Parallel Processing.

[45]  L. Almagor,et al.  Finding effective compilation sequences , 2004, LCTES '04.

[46]  Wei-Chung Hsu,et al.  Continuous Adaptive Object-Code Re-optimization Framework , 2004, Asia-Pacific Computer Systems Architecture Conference.

[47]  Paul H. J. Kelly,et al.  Runtime Code Generation in C++ as a Foundation for Domain-Specific Optimisation , 2003, Domain-Specific Program Generation.