Machine Learning in Compiler Optimization

In the last decade, machine-learning-based compilation has moved from an obscure research niche to a mainstream activity. In this paper, we describe the relationship between machine learning and compiler optimization and introduce the main concepts of features, models, training, and deployment. We then provide a comprehensive survey and provide a road map for the wide variety of different research areas. We conclude with a discussion on open issues in the area and potential research directions. This paper provides both an accessible introduction to the fast moving area of machine-learning-based compilation and a detailed bibliography of its main achievements.

[1]  Abid M. Malik Optimal Tile Size Selection Problem Using Machine Learning , 2012, 2012 11th International Conference on Machine Learning and Applications.

[2]  Gregory M. Kapfhammer,et al.  A genetic algorithm to improve linux kernel performance on resource-constrained devices , 2010, GECCO '10.

[3]  Faicel Chamroukhi,et al.  An Introduction to the Practical and Theoretical Aspects of Mixture-of-Experts Modeling , 2017, ArXiv.

[4]  Carl E. Rasmussen,et al.  In Advances in Neural Information Processing Systems , 2011 .

[5]  John Cavazos,et al.  HERCULES: Strong Patterns towards More Intelligent Predictive Modeling , 2014, 2014 43rd International Conference on Parallel Processing.

[6]  Jack J. Dongarra,et al.  A Portable Programming Interface for Performance Evaluation on Modern Processors , 2000, Int. J. High Perform. Comput. Appl..

[7]  Gianluca Palermo,et al.  MiCOMP: Mitigating the Compiler Phase-Ordering Problem Using Optimization Sub-Sequences and Machine Learning , 2017, TACO.

[8]  Michael F. P. O'Boyle,et al.  Integrating profile-driven parallelism detection and machine-learning-based mapping , 2014, TACO.

[9]  Honglak Lee,et al.  Unsupervised feature learning for audio classification using convolutional deep belief networks , 2009, NIPS.

[10]  François Bodin,et al.  A Machine Learning Approach to Automatic Production of Compiler Heuristics , 2002, AIMSA.

[11]  Henry Hoffmann,et al.  CALOREE: Learning Control for Predictable Latency and Low Energy , 2018, ASPLOS.

[12]  Kristian Zarb Adami,et al.  Machine Learning for Galaxy Morphology Classification , 2010, ArXiv.

[13]  Katharina Morik,et al.  Automatic WCET Reduction by Machine Learning Based Heuristics for Function Inlining , 2013 .

[14]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[15]  Michael F. P. O'Boyle,et al.  Method-specific dynamic compilation using logistic regression , 2006, OOPSLA '06.

[16]  Samuel Williams,et al.  Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[17]  Eamonn J. Keogh Nearest Neighbor , 2010, Encyclopedia of Machine Learning.

[18]  Martin Schulz,et al.  Adaptive Configuration Selection for Power-Constrained Heterogeneous Systems , 2014, 2014 43rd International Conference on Parallel Processing.

[19]  Michael F. P. O'Boyle,et al.  Using machine learning to partition streaming programs , 2013, ACM Trans. Archit. Code Optim..

[20]  Lingjia Tang,et al.  Compiling for niceness: mitigating contention for QoS in warehouse scale computers , 2012, CGO '12.

[21]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[22]  John A. Clark,et al.  Evolutionary Improvement of Programs , 2011, IEEE Transactions on Evolutionary Computation.

[23]  Christina Delimitrou,et al.  Quasar: resource-efficient and QoS-aware cluster management , 2014, ASPLOS.

[24]  Michael Voss,et al.  Runtime empirical selection of loop schedulers on hyperthreaded SMPs , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[25]  João M. P. Cardoso,et al.  A graph-based iterative compiler pass selection and phase ordering approach , 2016, LCTES.

[26]  James Ivory XIII. On the method of the least squares , 1825 .

[27]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[28]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[29]  Kapil Vaswani,et al.  A Predictive Performance Model for Superscalar Processors , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[30]  Apan Qasem,et al.  Automatic Restructuring of GPU Kernels for Exploiting Inter-thread Data Locality , 2012, CC.

[31]  Zheng Wang,et al.  Adaptive Optimization of Sparse Matrix-Vector Multiplication on Emerging Many-Core Architectures , 2018, 2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS).

[32]  Yuxi Li,et al.  Deep Reinforcement Learning: An Overview , 2017, ArXiv.

[33]  Randy H. Katz,et al.  Multi-Task Learning for Straggler Avoiding Predictive Job Scheduling , 2016, J. Mach. Learn. Res..

[34]  Rainer Leupers,et al.  Function inlining under code size constraints for embedded processors , 1999, 1999 IEEE/ACM International Conference on Computer-Aided Design. Digest of Technical Papers (Cat. No.99CH37051).

[35]  Michael F. P. O'Boyle,et al.  Change Detection Based Parallelism Mapping: Exploiting Offline Models and Online Adaptation , 2014, LCPC.

[36]  Luca Benini,et al.  Regression Models for Behavioral Power Estimation , 1998, Integr. Comput. Aided Eng..

[37]  Michael F. P. O'Boyle,et al.  Smart, adaptive mapping of parallelism in the presence of external workload , 2013, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[38]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[39]  Petar Tsankov,et al.  Statistical Deobfuscation of Android Applications , 2016, CCS.

[40]  Gianluca Palermo,et al.  A Bayesian network approach for compiler auto-tuning for embedded processors , 2014, 2014 IEEE 12th Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia).

[41]  B SheridanPeter The arithmetic translator-compiler of the IBM FORTRAN automatic coding system , 1959 .

[42]  Christopher C. Cummins,et al.  Synthesizing benchmarks for predictive modeling , 2017, 2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[43]  Xiaodong Gu,et al.  Deep API learning , 2016, SIGSOFT FSE.

[44]  Thomas Fahringer,et al.  Energy Prediction of OpenMP Applications Using Random Forest Modeling Approach , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop.

[45]  John A. Clark,et al.  The GISMOE challenge: constructing the pareto program surface using genetic programming to find better programs (keynote paper) , 2012, 2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering.

[46]  Alan Edelman,et al.  PetaBricks: a language and compiler for algorithmic choice , 2009, PLDI '09.

[47]  Ninghui Sun,et al.  FAST: A Fast Stencil Autotuning Framework Based On An Optimal-solution Space Model , 2015, ICS.

[48]  Lingjia Tang,et al.  SMiTe: Precise QoS Prediction on Real-System SMT Processors to Improve Utilization in Warehouse Scale Computers , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[49]  Michael F. P. O'Boyle,et al.  A workload-aware mapping approach for data-parallel programs , 2011, HiPEAC.

[50]  Lior Rokach,et al.  Ensemble-based classifiers , 2010, Artificial Intelligence Review.

[51]  David M. Brooks,et al.  Accurate and efficient regression modeling for microarchitectural performance and power prediction , 2006, ASPLOS XII.

[52]  Olivier Temam,et al.  Collective optimization: A practical collaborative approach , 2010, TACO.

[53]  Tomofumi Yuki,et al.  Automatic creation of tile size selection models , 2010, CGO '10.

[54]  R. Polikar,et al.  Ensemble based systems in decision making , 2006, IEEE Circuits and Systems Magazine.

[55]  Matthew E. Taylor,et al.  Evolving Compiler Heuristics to Manage Communication and Contention , 2010, AAAI.

[56]  Keith D. Cooper,et al.  Optimizing for reduced code space using genetic algorithms , 1999, LCTES '99.

[57]  Trevor N. Mudge,et al.  Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge , 2017, ASPLOS.

[58]  Sameer Kulkarni,et al.  Automatic construction of inlining heuristics using machine learning , 2013, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[59]  Martin White,et al.  Sorting and Transforming Program Repair Ingredients via Deep Learning Code Similarities , 2017, 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[60]  Michael F. P. O'Boyle,et al.  Automatic and Portable Mapping of Data Parallel Programs to OpenCL for GPU-Based Heterogeneous Systems , 2014, ACM Trans. Archit. Code Optim..

[61]  Davide Del Vento Performance optimization on a supercomputer with cTuning and the PGI compiler , 2012 .

[62]  David A. Padua,et al.  Optimizing sorting with genetic algorithms , 2005, International Symposium on Code Generation and Optimization.

[63]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[64]  Shajulin Benedict,et al.  Energy prediction of CUDA application instances using dynamic regression models , 2017, Computing.

[65]  Hyesoon Kim,et al.  Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[66]  Edwin V. Bonilla,et al.  Predicting best design trade-offs: A case study in processor customization , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[67]  Barry Porter,et al.  REX: A Development Platform and Online Learning Approach for Runtime Emergent Software Systems , 2016, OSDI.

[68]  Aaron Smith,et al.  A machine learning approach to mapping streaming workloads to dynamic multicore processors , 2016, LCTES.

[69]  Panagiotis Takis Metaxas,et al.  The power of prediction with social media , 2013, Internet Res..

[70]  Premkumar T. Devanbu,et al.  A Survey of Machine Learning for Big Code and Naturalness , 2017, ACM Comput. Surv..

[71]  Saman P. Amarasinghe,et al.  Meta optimization: improving compiler heuristics with machine learning , 2003, PLDI '03.

[72]  Lin Tan,et al.  CloCom: Mining existing source code for automatic comment generation , 2015, 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[73]  Ling Gao,et al.  Optimise web browsing on heterogeneous mobile platforms: A machine learning based approach , 2017, IEEE INFOCOM 2017 - IEEE Conference on Computer Communications.

[74]  Sriram Sankaran,et al.  Predictive modeling based power estimation for embedded multicore systems , 2016, Conf. Computing Frontiers.

[75]  Shin-Ming Cheng,et al.  FEAST: An Automated Feature Selection Framework for Compilation Tasks , 2016, 2017 IEEE 31st International Conference on Advanced Information Networking and Applications (AINA).

[76]  L. Almagor,et al.  Finding effective compilation sequences , 2004, LCTES '04.

[77]  Keith D. Cooper,et al.  An Adaptive Strategy for Inline Substitution , 2008, CC.

[78]  Samuel Williams,et al.  The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .

[79]  Matthias Hauswirth,et al.  Producing wrong data without doing anything obviously wrong! , 2009, ASPLOS.

[80]  Peter M. W. Knijnenburg,et al.  Iterative compilation in a non-linear optimisation space , 1998 .

[81]  Archana Ganapathi,et al.  A case for machine learning to optimize multicore performance , 2009 .

[82]  Michael F. P. O'Boyle,et al.  Partitioning streaming parallelism for multi-cores: A machine learning based approach , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[83]  Henry Massalin Superoptimizer: a look at the smallest program , 1987, ASPLOS 1987.

[84]  Santosh Pande,et al.  Brainy: effective selection of data structures , 2011, PLDI '11.

[85]  Danny Dig,et al.  API code recommendation using statistical learning from fine-grained changes , 2016, SIGSOFT FSE.

[86]  Dimitrios S. Nikolopoulos,et al.  Online power-performance adaptation of multithreaded programs using hardware event-based prediction , 2006, ICS '06.

[87]  Abid M. Malik Spatial Based Feature Generation for Machine Learning Based Optimization Compilation , 2010, 2010 Ninth International Conference on Machine Learning and Applications.

[88]  Michael F. P. O'Boyle,et al.  Combined Selection of Tile Sizes and Unroll Factors Using Iterative Compilation , 2000, Proceedings 2000 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00622).

[89]  E. Schmidt,et al.  Lex—a lexical analyzer generator , 1990 .

[90]  Yaniv David,et al.  Tracelet-based code search in executables , 2014, PLDI.

[91]  Chris Cummins,et al.  End-to-End Deep Learning of Optimization Heuristics , 2017, 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[92]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[93]  Bruno Ciciani,et al.  Machine Learning-Based Self-Adjusting Concurrency in Software Transactional Memory Systems , 2012, 2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[94]  Una-May O'Reilly,et al.  Siblingrivalry: online autotuning through local competitions , 2012, CASES '12.

[95]  John Cavazos,et al.  Using graph-based program characterization for predictive modeling , 2012, CGO '12.

[96]  Mark Newman,et al.  Networks: An Introduction , 2010 .

[97]  Ananta Tiwari,et al.  Online Adaptive Code Generation and Tuning , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[98]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[99]  I K Fodor,et al.  A Survey of Dimension Reduction Techniques , 2002 .

[100]  Jean-Luc Dekeyser,et al.  A system level power consumption estimation for MPSoC , 2011, 2011 International Symposium on System on Chip (SoC).

[101]  Sally A. McKee,et al.  Comparing Scalability Prediction Strategies on an SMP of CMPs , 2010, Euro-Par.

[102]  José A. B. Fortes,et al.  On the Use of Machine Learning to Predict the Time and Resources Consumed by Applications , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[103]  Brad Calder,et al.  Automatically characterizing large scale program behavior , 2002, ASPLOS X.

[104]  Michail G. Lagoudakis,et al.  Algorithm Selection using Reinforcement Learning , 2000, ICML.

[105]  Sally A. McKee,et al.  Methods of inference and learning for performance modeling of parallel applications , 2007, PPoPP.

[106]  Martin T. Vechev,et al.  Probabilistic model for code with decision trees , 2016, OOPSLA.

[107]  Alper Sen,et al.  Using Machine Learning Techniques to Detect Parallel Patterns of Multi-threaded Applications , 2015, International Journal of Parallel Programming.

[108]  Mark Stephenson,et al.  Predicting unroll factors using supervised classification , 2005, International Symposium on Code Generation and Optimization.

[109]  José Nelson Amaral,et al.  To Inline or Not to Inline? Enhanced Inlining Decisions , 2003, LCPC.

[110]  Michael A. Harrison,et al.  Accurate static estimators for program optimization , 1994, PLDI '94.

[111]  Zheng Wang,et al.  Fast Automatic Heuristic Construction Using Active Learning , 2014, LCPC.

[112]  Michael F. P. O'Boyle,et al.  OpenCL Task Partitioning in the Presence of GPU Contention , 2013, LCPC.

[113]  Michael F. P. O'Boyle,et al.  Fast compiler optimisation evaluation using code-feature based performance prediction , 2007, CF '07.

[114]  Jordi Torres,et al.  Towards energy-aware scheduling in data centers using machine learning , 2010, e-Energy.

[115]  Keshav Pingali,et al.  A quantitative study of irregular programs on GPUs , 2012, 2012 IEEE International Symposium on Workload Characterization (IISWC).

[116]  Andreas Krause,et al.  Active Learning for Multi-Objective Optimization , 2013, ICML.

[117]  Yi Yang,et al.  A unified optimizing compiler framework for different GPGPU architectures , 2012, TACO.

[118]  Lingjia Tang,et al.  Treadmill: Attributing the Source of Tail Latency through Precise Load Testing and Statistical Inference , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[119]  Michael F. P. O'Boyle,et al.  Automatic optimization of thread-coarsening for graphics processors , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).

[120]  Prasad A. Kulkarni,et al.  Exploiting phase inter-dependencies for faster iterative compiler optimization phase order searches , 2013, 2013 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES).

[121]  Michael F. P. O'Boyle,et al.  Automatic Feature Generation for Machine Learning Based Optimizing Compilation , 2009, 2009 International Symposium on Code Generation and Optimization.

[122]  Steven G. Johnson,et al.  The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.

[123]  Sharad Malik,et al.  Power analysis of embedded software: a first step towards software power minimization , 1994, IEEE Trans. Very Large Scale Integr. Syst..

[124]  Geoffrey E. Hinton,et al.  Binary coding of speech spectrograms using a deep auto-encoder , 2010, INTERSPEECH.

[125]  Kalyan Veeramachaneni,et al.  Autotuning algorithmic choice for input sensitivity , 2015, PLDI.

[126]  Alexandre C. B. Delbem,et al.  Exploration of compiler optimization sequences using clustering-based selection , 2014, LCTES '14.

[127]  Xipeng Shen,et al.  A cross-input adaptive framework for GPU program optimizations , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[128]  Michael F. P. O'Boyle,et al.  Smart multi-task scheduling for OpenCL programs on CPU/GPU heterogeneous platforms , 2014, 2014 21st International Conference on High Performance Computing (HiPC).

[129]  Manuela M. Veloso,et al.  Learning to Predict Performance from Formula Modeling and Training Data , 2000, ICML.

[130]  Feng Mao,et al.  Exploiting statistical correlations for proactive prediction of program behaviors , 2010, CGO '10.

[131]  Torsten Hoefler,et al.  Using Compiler Techniques to Improve Automatic Performance Modeling , 2015, 2015 International Conference on Parallel Architecture and Compilation (PACT).

[132]  Albert Cohen,et al.  Predictive modeling in a polyhedral optimization space , 2011, CGO 2011.

[133]  Shoaib Kamil,et al.  OpenTuner: An extensible framework for program autotuning , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).

[134]  Zheng Wang,et al.  Adaptive optimization for OpenCL programs on embedded heterogeneous systems , 2017, LCTES.

[135]  Lieven Eeckhout,et al.  Performance prediction based on inherent program similarity , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[136]  Ion Stoica,et al.  Ernest: Efficient Performance Prediction for Large-Scale Advanced Analytics , 2016, NSDI.

[137]  Onur Mutlu,et al.  Self-Optimizing Memory Controllers: A Reinforcement Learning Approach , 2008, 2008 International Symposium on Computer Architecture.

[138]  Lizy Kurian John,et al.  Analysis of redundancy and application balance in the SPEC CPU2006 benchmark suite , 2007, ISCA '07.

[139]  J. Chipps,et al.  A mathematical language compiler , 1956, ACM '56.

[140]  Xuan Chen,et al.  Adaptive Multi-versioning for OpenMP Parallelization via Machine Learning , 2009, 2009 15th International Conference on Parallel and Distributed Systems.

[141]  Michael F. P. O'Boyle,et al.  Mapping parallelism to multi-cores: a machine learning based approach , 2009, PPoPP '09.

[142]  Michael F. P. O'Boyle,et al.  Portable mapping of data parallel programs to OpenCL for heterogeneous systems , 2013, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[143]  Daniel Mossé,et al.  Octopus-Man: QoS-driven task management for heterogeneous multicores in warehouse-scale computers , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[144]  Martin White,et al.  Deep learning code fragments for code clone detection , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[145]  M. Douglas McIlroy Macro instruction extensions of compiler languages , 1960, CACM.

[146]  Vijay Janapa Reddi,et al.  High-performance and energy-efficient mobile web browsing on big/little systems , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[147]  Eric A. Brewer,et al.  High-level optimization via automated statistical modeling , 1995, PPOPP '95.

[148]  Rainer Leupers,et al.  Frequency-Aware ESL Power Estimation for ARM Cortex-A9 Using a Black Box Processor Model , 2016, ACM Trans. Embed. Comput. Syst..

[149]  Lieven Eeckhout,et al.  Evaluating iterative optimization across 1000 datasets , 2010, PLDI '10.

[150]  Michael F. P. O'Boyle,et al.  Reducing Training Time in a One-Shot Machine Learning-Based Compiler , 2009, LCPC.

[151]  Eran Yahav,et al.  Statistical similarity of binaries , 2016, PLDI.

[152]  Le Yi Wang,et al.  VCONF: a reinforcement learning approach to virtual machines auto-configuration , 2009, ICAC '09.

[153]  Prasanna Balaprakash,et al.  Active-learning-based surrogate models for empirical performance tuning , 2013, 2013 IEEE International Conference on Cluster Computing (CLUSTER).

[154]  K JohnLizy,et al.  Analysis of redundancy and application balance in the SPEC CPU2006 benchmark suite , 2007 .

[155]  Michael Voss,et al.  High-level adaptive program optimization with ADAPT , 2001, PPoPP '01.

[156]  Alan Edelman,et al.  Language and compiler support for auto-tuning variable-accuracy algorithms , 2011, International Symposium on Code Generation and Optimization (CGO 2011).

[157]  Albert Cohen,et al.  Iterative optimization in the polyhedral model: part ii, multidimensional time , 2008, PLDI '08.

[158]  Una-May O'Reilly,et al.  Hyperparameter Tuning in Bandit-Based Adaptive Operator Selection , 2012, EvoApplications.

[159]  Michael F. P. O'Boyle,et al.  Celebrating diversity: a mixture of experts approach for runtime mapping in dynamic environments , 2015, PLDI.

[160]  Charles A. Sutton,et al.  Parameter-free probabilistic API mining across GitHub , 2015, SIGSOFT FSE.

[161]  Barry Porter,et al.  Improving Spark Application Throughput Via Memory Aware Task Co-location: A Mixture of Experts Approach , 2017 .

[162]  Lieven Eeckhout,et al.  Cole: compiler optimization level exploration , 2008, CGO '08.

[163]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[164]  Lieven Eeckhout,et al.  Workload design: selecting representative program-input pairs , 2002, Proceedings.International Conference on Parallel Architectures and Compilation Techniques.

[165]  Brad Calder,et al.  Using SimPoint for accurate and efficient simulation , 2003, SIGMETRICS '03.

[166]  Michael F. P. O'Boyle,et al.  Rapidly Selecting Good Compiler Optimizations using Performance Counters , 2007, International Symposium on Code Generation and Optimization (CGO'07).

[167]  Sally A. McKee,et al.  Identifying energy-efficient concurrency levels using machine learning , 2007, 2007 IEEE International Conference on Cluster Computing.

[168]  Tao Wang,et al.  Convolutional Neural Networks over Tree Structures for Programming Language Processing , 2014, AAAI.

[169]  Manuela M. Veloso,et al.  Learning to Construct Fast Signal Processing Implementations , 2002, J. Mach. Learn. Res..

[170]  Peng Zhang,et al.  Auto-tuning Streamed Applications on Intel Xeon Phi , 2018, 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[171]  Alan Edelman,et al.  Autotuning multigrid with PetaBricks , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[172]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[173]  Jack J. Dongarra,et al.  Performance-Portable Autotuning of OpenCL Kernels for Convolutional Layers of Deep Neural Networks , 2016, 2016 2nd Workshop on Machine Learning in HPC Environments (MLHPC).

[174]  Michael F. P. O'Boyle,et al.  Automatic Tuning of Inlining Heuristics , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[175]  Michael F. P. O'Boyle,et al.  Using machine learning to focus iterative optimization , 2006, International Symposium on Code Generation and Optimization (CGO'06).

[176]  Barton P. Miller,et al.  Extracting compiler provenance from program binaries , 2010, PASTE '10.

[177]  Wei Wang,et al.  ReQoS: reactive static/dynamic compilation for QoS in warehouse scale computers , 2013, ASPLOS '13.

[178]  Y. N. Srikant,et al.  Microarchitecture Sensitive Empirical Models for Compiler Optimizations , 2007, International Symposium on Code Generation and Optimization (CGO'07).

[179]  Peter B. Sheridan,et al.  The arithmetic translator-compiler of the IBM FORTRAN automatic coding system , 1959, CACM.

[180]  Albert Cohen,et al.  Practical aggregation of semantical program properties for machine learning based optimization , 2010, CASES '10.

[181]  Salman Khan,et al.  Using PredictiveModeling for Cross-Program Design Space Exploration in Multicore Systems , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).

[182]  Keith D. Cooper,et al.  Adaptive Optimizing Compilers for the 21st Century , 2002, The Journal of Supercomputing.

[183]  Murray Hill,et al.  Yacc: Yet Another Compiler-Compiler , 1978 .

[184]  James Demmel,et al.  Benchmarking GPUs to tune dense linear algebra , 2008, HiPC 2008.

[185]  Jack J. Dongarra,et al.  Implementation and Tuning of Batched Cholesky Factorization and Solve for NVIDIA GPUs , 2016, IEEE Transactions on Parallel and Distributed Systems.

[186]  Bruce R. Childers,et al.  Building and using application utility models to dynamically choose thread counts , 2014, The Journal of Supercomputing.

[187]  Roberto Santana,et al.  Evolutionary Optimization of Compiler Flag Selection by Learning and Exploiting Flags Interactions , 2016, GECCO.

[188]  Bronis R. de Supinski,et al.  Prediction models for multi-dimensional power-performance optimization on many cores , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[189]  Michael F. P. O'Boyle,et al.  Automatic performance model construction for the fast software exploration of new hardware designs , 2006, CASES '06.

[190]  Keith D. Cooper,et al.  ACME: adaptive compilation made efficient , 2005, LCTES '05.

[191]  Jean-François Méhaut,et al.  A machine learning-based approach for thread mapping on transactional memory applications , 2011, 2011 18th International Conference on High Performance Computing.

[192]  Pavlos Petoumenos,et al.  Minimizing the cost of iterative compilation with active learning , 2017, 2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[193]  Tomoyuki Hiroyasu,et al.  SPEA2+: Improving the Performance of the Strength Pareto Evolutionary Algorithm 2 , 2004, PPSN.

[194]  R. J. Adcock A Problem in Least Squares , 1878 .