COBAYN: Compiler Autotuning Framework Using Bayesian Networks

The variety of today’s architectures forces programmers to spend a great deal of time porting and tuning application codes across different platforms. Compilers themselves need additional tuning, which has considerable complexity as the standard optimization levels, usually designed for the average case and the specific target architecture, often fail to bring the best results. This article proposes COBAYN: Compiler autotuning framework using BAYesian Networks, an approach for a compiler autotuning methodology using machine learning to speed up application performance and to reduce the cost of the compiler optimization phases. The proposed framework is based on the application characterization done dynamically by using independent microarchitecture features and Bayesian networks. The article also presents an evaluation based on using static analysis and hybrid feature collection approaches. In addition, the article compares Bayesian networks with respect to several state-of-the-art machine-learning models. Experiments were carried out on an ARM embedded platform and GCC compiler by considering two benchmark suites with 39 applications. The set of compiler configurations, selected by the model (less than 7% of the search space), demonstrated an application performance speedup of up to 4.6 × on Polybench (1.85 × on average) and 3.1 × on cBench (1.54 × on average) with respect to standard optimization levels. Moreover, the comparison of the proposed technique with (i) random iterative compilation, (ii) machine learning--based iterative compilation, and (iii) noniterative predictive modeling techniques shows, on average, 1.2 × , 1.37 × , and 1.48 × speedup, respectively. Finally, the proposed method demonstrates 4 × and 3 × speedup, respectively, on cBench and Polybench in terms of exploration efficiency given the same quality of the solutions generated by the random iterative compilation model.

[1]  Yun Liang,et al.  Optimizing and auto-tuning scale-free sparse matrix-vector multiplication on Intel Xeon Phi , 2015, 2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[2]  Michael F. P. O'Boyle,et al.  Milepost GCC: Machine Learning Enabled Self-tuning Compiler , 2011, International Journal of Parallel Programming.

[3]  Michael F. P. O'Boyle,et al.  A Feasibility Study in Iterative Compilation , 1999, ISHPC.

[4]  Keith D. Cooper,et al.  Optimizing for reduced code space using genetic algorithms , 1999, LCTES '99.

[5]  Gary S. Tyson,et al.  Practical exhaustive optimization phase order exploration and evaluation , 2009, TACO.

[6]  Cédric Bastoul,et al.  Predictive Modeling in a Polyhedral Optimization Space , 2011, International Symposium on Code Generation and Optimization (CGO 2011).

[7]  Sameer Kulkarni,et al.  Mitigating the compiler optimization phase-ordering problem using machine learning , 2012, OOPSLA '12.

[8]  Lieven Eeckhout,et al.  Microarchitecture-Independent Workload Characterization , 2007, IEEE Micro.

[9]  Alexandre C. B. Delbem,et al.  Clustering-Based Selection for the Exploration of Compiler Optimization Sequences , 2016, ACM Trans. Archit. Code Optim..

[10]  Richard L. Gorsuch Exploratory Factor Analysis , 1988 .

[11]  Michael F. P. O'Boyle,et al.  Combined Selection of Tile Sizes and Unroll Factors Using Iterative Compilation , 2004, The Journal of Supercomputing.

[12]  John Cavazos,et al.  HERCULES: Strong Patterns towards More Intelligent Predictive Modeling , 2014, 2014 43rd International Conference on Parallel Processing.

[13]  Shoaib Kamil,et al.  OpenTuner: An extensible framework for program autotuning , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).

[14]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[15]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[16]  Richard E. Neapolitan,et al.  Learning Bayesian networks , 2007, KDD '07.

[17]  Hans-Peter Kriegel,et al.  Shortest-path kernels on graphs , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[18]  Saman P. Amarasinghe,et al.  Meta optimization: improving compiler heuristics with machine learning , 2003, PLDI '03.

[19]  Concha Bielza,et al.  Mateda-2.0: Estimation of Distribution Algorithms in MATLAB , 2010 .

[20]  Zhanpeng Jin,et al.  Improve simulation efficiency using statistical benchmark subsetting - An implantbench case study , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[21]  Oscar R. Hernandez,et al.  HERCULES: A Pattern Driven Code Transformation System , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[22]  Mirai Tanaka,et al.  Positive definite matrix approximation with condition number constraint , 2014, Optim. Lett..

[23]  Michael F. P. O'Boyle,et al.  Automatic Feature Generation for Machine Learning Based Optimizing Compilation , 2009, 2009 International Symposium on Code Generation and Optimization.

[24]  Kalyan Veeramachaneni,et al.  Autotuning algorithmic choice for input sensitivity , 2015, PLDI.

[25]  Bruce Thompson,et al.  "Statistical," "practical", and "clinical": How many kinds of significance do counselors need to consider? , 2002 .

[26]  Peter M. W. Knijnenburg,et al.  Iterative compilation in a non-linear optimisation space , 1998 .

[27]  R. Bhatia Positive Definite Matrices , 2007 .

[28]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[29]  H. Kaiser The varimax criterion for analytic rotation in factor analysis , 1958 .

[30]  Concha Bielza,et al.  Mateda-2.0: A MATLAB package for the implementation and analysis of estimation of distribution algorithms , 2010 .

[31]  Michael F. P. O'Boyle,et al.  MILEPOST GCC: machine learning based research compiler , 2008 .

[32]  Lieven Eeckhout,et al.  Practical Iterative Optimization for the Data Center , 2015, ACM Trans. Archit. Code Optim..

[33]  Alan Edelman,et al.  PetaBricks: a language and compiler for algorithmic choice , 2009, PLDI '09.

[34]  Vittorio Zaccaria,et al.  A framework for Compiler Level statistical analysis over customized VLIW architecture , 2013, 2013 IFIP/IEEE 21st International Conference on Very Large Scale Integration (VLSI-SoC).

[35]  C. Pipper,et al.  [''R"--project for statistical computing]. , 2008, Ugeskrift for laeger.

[36]  Lieven Eeckhout,et al.  Deconstructing iterative optimization , 2012, TACO.

[37]  Junghsi Lee,et al.  A stability condition for certain bilinear systems , 1994, IEEE Trans. Signal Process..

[38]  Lieven Eeckhout,et al.  Cole: compiler optimization level exploration , 2008, CGO '08.

[39]  Vittorio Zaccaria,et al.  Multi-objective design space exploration of embedded systems , 2003, J. Embed. Comput..

[40]  Michael F. P. O'Boyle,et al.  Rapidly Selecting Good Compiler Optimizations using Performance Counters , 2007, International Symposium on Code Generation and Optimization (CGO'07).

[41]  John Cavazos,et al.  Using graph-based program characterization for predictive modeling , 2012, CGO '12.

[42]  Michael F. P. O'Boyle,et al.  Using machine learning to focus iterative optimization , 2006, International Symposium on Code Generation and Optimization (CGO'06).

[43]  Torsten Hoefler,et al.  Scientific Benchmarking of Parallel Computing Systems Twelve ways to tell the masses when reporting performance results , 2017 .

[44]  Gianluca Palermo,et al.  Predictive modeling methodology for compiler phase-ordering , 2016, PARMA-DITAM '16.

[45]  Alexander Aiken,et al.  Stochastic optimization of floating-point programs with tunable precision , 2014, PLDI.

[46]  Pen-Chung Yew,et al.  Improving compiler scalability: optimizing large programs at small price , 2015, PLDI.

[47]  Lifan Xu,et al.  Auto-tuning a high-level language targeted to GPU codes , 2012, 2012 Innovative Parallel Computing (InPar).

[48]  Kevin Murphy,et al.  Bayes net toolbox for Matlab , 1999 .

[49]  Suresh Purini,et al.  Finding good optimization sequences covering program space , 2013, TACO.

[50]  Gianluca Palermo,et al.  A Bayesian network approach for compiler auto-tuning for embedded processors , 2014, 2014 IEEE 12th Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia).