Collective Tuning Initiative: automating and accelerating development and optimization of computing systems

Computing systems rarely deliver best possible performance due to ever increasing hardware and software complexity and limitations of the current optimization technology. Additional code and architecture optimizations are often required to improve execution time, size, power consumption, reliability and other important characteristics of computing systems. However, it is often a tedious, repetitive, isolated and time consuming process. In order to automate, simplify and systematize program optimization and architecture design, we are developing open-source modular plugin-based Collective Tuning Infrastructure (http://ctuning.org) that can distribute optimization process and leverage optimization experience of multiple users. The core of this infrastructure is a Collective Optimization Database that allows easy collection, sharing, characterization and reuse of a large number of optimization cases from the community. The infrastructure also includes collaborative R\&D tools with common API (Continuous Collective Compilation Framework, MILEPOST GCC with Interactive Compilation Interface and static feature extractor, Collective Benchmark and Universal Run-time Adaptation Framework) to automate optimization, produce adaptive applications and enable realistic benchmarking. We developed several tools and open web-services to substitute default compiler optimization heuristic and predict good optimizations for a given program, dataset and architecture based on static and dynamic program features and standard machine learning techniques. Collective tuning infrastructure provides a novel fully integrated, collaborative, "one button" approach to improve existing underperfoming computing systems ranging from embedded architectures to high-performance servers based on systematic iterative compilation, statistical collective optimization and machine learning. Our experimental results show that it is possible to reduce execution time (and code size) of some programs from SPEC2006 and EEMBC among others by more than a factor of 2 automatically. It can also reduce development and testing time considerably. Together with the first production quality machine learning enabled interactive research compiler (MILEPOST GCC) this infrastructure opens up many research opportunities to study and develop future realistic self-tuning and self-organizing adaptive intelligent computing systems based on systematic statistical performance evaluation and benchmarking. Finally, using common optimization repository is intended to improve the quality and reproducibility of the research on architecture and code optimization.

[1]  Michael F. P. O'Boyle,et al.  Using machine learning to focus iterative optimization , 2006, International Symposium on Code Generation and Optimization (CGO'06).

[2]  Brad Calder,et al.  Online performance auditing: using hot optimizations without getting burned , 2006, PLDI '06.

[3]  Lieven Eeckhout,et al.  Cole: compiler optimization level exploration , 2008, CGO '08.

[4]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[5]  Martin C. Rinard,et al.  Dynamic feedback: an effective technique for adaptive computing , 1997, PLDI '97.

[6]  E. Zadok,et al.  Extending GCC with Modular GIMPLE Optimizations , .

[7]  Michael F. P. O'Boyle,et al.  Rapidly Selecting Good Compiler Optimizations using Performance Counters , 2007, International Symposium on Code Generation and Optimization (CGO'07).

[8]  Michael F. P. O'Boyle,et al.  A fast and accurate method for determining a lower bound on execution time , 2004, Concurr. Comput. Pract. Exp..

[9]  Grigori Fursin,et al.  Probabilistic source-level optimisation of embedded programs , 2005, LCTES '05.

[10]  Michael F. P. O'Boyle,et al.  Automatic performance model construction for the fast software exploration of new hardware designs , 2006, CASES '06.

[11]  Michael Wolfe,et al.  Multiple Version Loops , 1987, ICPP.

[12]  Peter M. W. Knijnenburg,et al.  Iterative compilation in a non-linear optimisation space , 1998 .

[13]  Olivier Temam,et al.  Collective Optimization , 2008, HiPEAC.

[14]  David I. August,et al.  Compiler optimization-space exploration , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..

[15]  Michael F. P. O'Boyle,et al.  MiDataSets: Creating the Conditions for a More Realistic Evaluation of Iterative Optimization , 2007, HiPEAC.

[16]  Steven G. Johnson,et al.  FFTW: an adaptive software architecture for the FFT , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[17]  Mark Stephenson,et al.  Predicting unroll factors using supervised classification , 2005, International Symposium on Code Generation and Optimization.

[18]  François Bodin,et al.  A Machine Learning Approach to Automatic Production of Compiler Heuristics , 2002, AIMSA.

[19]  Wei-Chung Hsu,et al.  Design and Implementation of a Lightweight Dynamic Optimization System , 2004, J. Instr. Level Parallelism.

[20]  Mary Lou Soffa,et al.  A model-based framework: an approach for profit-driven optimization , 2005, International Symposium on Code Generation and Optimization.

[21]  Albert Cohen,et al.  Building a Practical Iterative Interactive Compiler , 2007 .

[22]  Jason Mars,et al.  Scenario Based Optimization: A Framework for Statically Enabling Online Optimizations , 2009, 2009 International Symposium on Code Generation and Optimization.

[23]  Michael Voss,et al.  High-level adaptive program optimization with ADAPT , 2001, PPoPP '01.

[24]  Josep Llosa,et al.  The MHAOTEU Toolset for Memory Hierarchy Management , 2000 .

[25]  Michael Voss,et al.  ADAPT: Automated De-coupled Adaptive Program Transformation , 2000, Proceedings 2000 International Conference on Parallel Processing.

[26]  Manuela M. Veloso,et al.  Learning to Predict Performance from Formula Modeling and Training Data , 2000, ICML.

[27]  Michael F. P. O'Boyle,et al.  MARS: A Distributed Memory Approach to Shared Memory Compilation , 1998, LCR.

[28]  Michael F. P. O'Boyle,et al.  Evaluating Iterative Compilation , 2002, LCPC.

[29]  Grigori Fursin,et al.  A Cost-Aware Parallel Workload Allocation Approach Based on Machine Learning Techniques , 2007, NPC.

[30]  Michael F. P. O'Boyle,et al.  Towards a holistic approach to auto-parallelization: integrating profile-driven parallelism detection and machine-learning based mapping , 2009, PLDI '09.

[31]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[32]  Jack J. Dongarra,et al.  Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[33]  George Ho,et al.  PAPI: A Portable Interface to Hardware Performance Counters , 1999 .

[34]  Basile Starynkevitch,et al.  Multi-Stage Construction of a Global Static Analyzer , 2007 .

[35]  Keith D. Cooper,et al.  Adaptive Optimizing Compilers for the 21st Century , 2002, The Journal of Supercomputing.

[36]  Saman P. Amarasinghe,et al.  Meta optimization: improving compiler heuristics with machine learning , 2003, PLDI '03.

[37]  Grigori Fursin,et al.  Iterative compilation and performance prediction for numerical applications , 2004 .

[38]  Albert Cohen,et al.  A Practical Method for Quickly Evaluating Program Optimizations , 2005, HiPEAC.

[39]  Michael F. P. O'Boyle,et al.  A fast and accurate method for determining a lower bound on execution time: Research Articles , 2004 .

[40]  Keith D. Cooper,et al.  Optimizing for reduced code space using genetic algorithms , 1999, LCTES '99.

[41]  Michael F. P. O'Boyle,et al.  MILEPOST GCC: machine learning based research compiler , 2008 .

[42]  Yunheung Paek,et al.  Finding effective optimization phase sequences , 2003, LCTES '03.

[43]  Michael F. P. O'Boyle,et al.  OCEANS: Optimizing Compilers for Embedded Applications , 1997, Euro-Par.

[44]  Albert Cohen,et al.  Practical Run-time Adaptation with Procedure Cloning to Enable Continuous Collective Compilation , 2007 .

[45]  Rudolf Eigenmann,et al.  Fast and effective orchestration of compiler optimizations for automatic performance tuning , 2006, International Symposium on Code Generation and Optimization (CGO'06).

[46]  Grigori Fursin,et al.  Predictive Runtime Code Scheduling for Heterogeneous Architectures , 2008, HiPEAC.