论文信息 - Machine Learning in Compiler Optimization

Machine Learning in Compiler Optimization

In the last decade, machine-learning-based compilation has moved from an obscure research niche to a mainstream activity. In this paper, we describe the relationship between machine learning and compiler optimization and introduce the main concepts of features, models, training, and deployment. We then provide a comprehensive survey and provide a road map for the wide variety of different research areas. We conclude with a discussion on open issues in the area and potential research directions. This paper provides both an accessible introduction to the fast moving area of machine-learning-based compilation and a detailed bibliography of its main achievements.

[1] Abid M. Malik. Optimal Tile Size Selection Problem Using Machine Learning , 2012, 2012 11th International Conference on Machine Learning and Applications.

[2] Gregory M. Kapfhammer,et al. A genetic algorithm to improve linux kernel performance on resource-constrained devices , 2010, GECCO '10.

[3] Faicel Chamroukhi,et al. An Introduction to the Practical and Theoretical Aspects of Mixture-of-Experts Modeling , 2017, ArXiv.

[4] Carl E. Rasmussen,et al. In Advances in Neural Information Processing Systems , 2011 .

[5] John Cavazos,et al. HERCULES: Strong Patterns towards More Intelligent Predictive Modeling , 2014, 2014 43rd International Conference on Parallel Processing.

[6] Jack J. Dongarra,et al. A Portable Programming Interface for Performance Evaluation on Modern Processors , 2000, Int. J. High Perform. Comput. Appl..

[7] Gianluca Palermo,et al. MiCOMP: Mitigating the Compiler Phase-Ordering Problem Using Optimization Sub-Sequences and Machine Learning , 2017, TACO.

[8] Michael F. P. O'Boyle,et al. Integrating profile-driven parallelism detection and machine-learning-based mapping , 2014, TACO.

[9] Honglak Lee,et al. Unsupervised feature learning for audio classification using convolutional deep belief networks , 2009, NIPS.

[10] François Bodin,et al. A Machine Learning Approach to Automatic Production of Compiler Heuristics , 2002, AIMSA.

[11] Henry Hoffmann,et al. CALOREE: Learning Control for Predictable Latency and Low Energy , 2018, ASPLOS.

[12] Kristian Zarb Adami,et al. Machine Learning for Galaxy Morphology Classification , 2010, ArXiv.

[13] Katharina Morik,et al. Automatic WCET Reduction by Machine Learning Based Heuristics for Function Inlining , 2013 .

[14] Thomas G. Dietterich. Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[15] Michael F. P. O'Boyle,et al. Method-specific dynamic compilation using logistic regression , 2006, OOPSLA '06.

[16] Samuel Williams,et al. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[17] Eamonn J. Keogh. Nearest Neighbor , 2010, Encyclopedia of Machine Learning.

[18] Martin Schulz,et al. Adaptive Configuration Selection for Power-Constrained Heterogeneous Systems , 2014, 2014 43rd International Conference on Parallel Processing.

[19] Michael F. P. O'Boyle,et al. Using machine learning to partition streaming programs , 2013, ACM Trans. Archit. Code Optim..

[20] Lingjia Tang,et al. Compiling for niceness: mitigating contention for QoS in warehouse scale computers , 2012, CGO '12.

[21] Yoshua. Bengio,et al. Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[22] John A. Clark,et al. Evolutionary Improvement of Programs , 2011, IEEE Transactions on Evolutionary Computation.

[23] Christina Delimitrou,et al. Quasar: resource-efficient and QoS-aware cluster management , 2014, ASPLOS.

[24] Michael Voss,et al. Runtime empirical selection of loop schedulers on hyperthreaded SMPs , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[25] João M. P. Cardoso,et al. A graph-based iterative compiler pass selection and phase ordering approach , 2016, LCTES.

[26] James Ivory. XIII. On the method of the least squares , 1825 .

[27] Vikram S. Adve,et al. LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[28] Jonathan Goldstein,et al. When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[29] Kapil Vaswani,et al. A Predictive Performance Model for Superscalar Processors , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[30] Apan Qasem,et al. Automatic Restructuring of GPU Kernels for Exploiting Inter-thread Data Locality , 2012, CC.

[31] Zheng Wang,et al. Adaptive Optimization of Sparse Matrix-Vector Multiplication on Emerging Many-Core Architectures , 2018, 2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS).

[32] Yuxi Li,et al. Deep Reinforcement Learning: An Overview , 2017, ArXiv.

[33] Randy H. Katz,et al. Multi-Task Learning for Straggler Avoiding Predictive Job Scheduling , 2016, J. Mach. Learn. Res..

[34] Rainer Leupers,et al. Function inlining under code size constraints for embedded processors , 1999, 1999 IEEE/ACM International Conference on Computer-Aided Design. Digest of Technical Papers (Cat. No.99CH37051).

[35] Michael F. P. O'Boyle,et al. Change Detection Based Parallelism Mapping: Exploiting Offline Models and Online Adaptation , 2014, LCPC.

[36] Luca Benini,et al. Regression Models for Behavioral Power Estimation , 1998, Integr. Comput. Aided Eng..

[37] Michael F. P. O'Boyle,et al. Smart, adaptive mapping of parallelism in the presence of external workload , 2013, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[38] Yoshua Bengio,et al. Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[39] Petar Tsankov,et al. Statistical Deobfuscation of Android Applications , 2016, CCS.

[40] Gianluca Palermo,et al. A Bayesian network approach for compiler auto-tuning for embedded processors , 2014, 2014 IEEE 12th Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia).

[41] B SheridanPeter. The arithmetic translator-compiler of the IBM FORTRAN automatic coding system , 1959 .

[42] Christopher C. Cummins,et al. Synthesizing benchmarks for predictive modeling , 2017, 2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[43] Xiaodong Gu,et al. Deep API learning , 2016, SIGSOFT FSE.

[44] Thomas Fahringer,et al. Energy Prediction of OpenMP Applications Using Random Forest Modeling Approach , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop.

[45] John A. Clark,et al. The GISMOE challenge: constructing the pareto program surface using genetic programming to find better programs (keynote paper) , 2012, 2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering.

[46] Alan Edelman,et al. PetaBricks: a language and compiler for algorithmic choice , 2009, PLDI '09.

[47] Ninghui Sun,et al. FAST: A Fast Stencil Autotuning Framework Based On An Optimal-solution Space Model , 2015, ICS.

[48] Lingjia Tang,et al. SMiTe: Precise QoS Prediction on Real-System SMT Processors to Improve Utilization in Warehouse Scale Computers , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[49] Michael F. P. O'Boyle,et al. A workload-aware mapping approach for data-parallel programs , 2011, HiPEAC.

[50] Lior Rokach,et al. Ensemble-based classifiers , 2010, Artificial Intelligence Review.

[51] David M. Brooks,et al. Accurate and efficient regression modeling for microarchitectural performance and power prediction , 2006, ASPLOS XII.

[52] Olivier Temam,et al. Collective optimization: A practical collaborative approach , 2010, TACO.

[53] Tomofumi Yuki,et al. Automatic creation of tile size selection models , 2010, CGO '10.

[54] R. Polikar,et al. Ensemble based systems in decision making , 2006, IEEE Circuits and Systems Magazine.

[55] Matthew E. Taylor,et al. Evolving Compiler Heuristics to Manage Communication and Contention , 2010, AAAI.

[56] Keith D. Cooper,et al. Optimizing for reduced code space using genetic algorithms , 1999, LCTES '99.

[57] Trevor N. Mudge,et al. Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge , 2017, ASPLOS.

[58] Sameer Kulkarni,et al. Automatic construction of inlining heuristics using machine learning , 2013, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[59] Martin White,et al. Sorting and Transforming Program Repair Ingredients via Deep Learning Code Similarities , 2017, 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[60] Michael F. P. O'Boyle,et al. Automatic and Portable Mapping of Data Parallel Programs to OpenCL for GPU-Based Heterogeneous Systems , 2014, ACM Trans. Archit. Code Optim..

[61] Davide Del Vento. Performance optimization on a supercomputer with cTuning and the PGI compiler , 2012 .

[62] David A. Padua,et al. Optimizing sorting with genetic algorithms , 2005, International Symposium on Code Generation and Optimization.

[63] J. MacQueen. Some methods for classification and analysis of multivariate observations , 1967 .

[64] Shajulin Benedict,et al. Energy prediction of CUDA application instances using dynamic regression models , 2017, Computing.

[65] Hyesoon Kim,et al. Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[66] Edwin V. Bonilla,et al. Predicting best design trade-offs: A case study in processor customization , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[67] Barry Porter,et al. REX: A Development Platform and Online Learning Approach for Runtime Emergent Software Systems , 2016, OSDI.

[68] Aaron Smith,et al. A machine learning approach to mapping streaming workloads to dynamic multicore processors , 2016, LCTES.

[69] Panagiotis Takis Metaxas,et al. The power of prediction with social media , 2013, Internet Res..

[70] Premkumar T. Devanbu,et al. A Survey of Machine Learning for Big Code and Naturalness , 2017, ACM Comput. Surv..

[71] Saman P. Amarasinghe,et al. Meta optimization: improving compiler heuristics with machine learning , 2003, PLDI '03.

[72] Lin Tan,et al. CloCom: Mining existing source code for automatic comment generation , 2015, 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[73] Ling Gao,et al. Optimise web browsing on heterogeneous mobile platforms: A machine learning based approach , 2017, IEEE INFOCOM 2017 - IEEE Conference on Computer Communications.

[74] Sriram Sankaran,et al. Predictive modeling based power estimation for embedded multicore systems , 2016, Conf. Computing Frontiers.

[75] Shin-Ming Cheng,et al. FEAST: An Automated Feature Selection Framework for Compilation Tasks , 2016, 2017 IEEE 31st International Conference on Advanced Information Networking and Applications (AINA).

[76] L. Almagor,et al. Finding effective compilation sequences , 2004, LCTES '04.

[77] Keith D. Cooper,et al. An Adaptive Strategy for Inline Substitution , 2008, CC.

[78] Samuel Williams,et al. The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .

[79] Matthias Hauswirth,et al. Producing wrong data without doing anything obviously wrong! , 2009, ASPLOS.

[80] Peter M. W. Knijnenburg,et al. Iterative compilation in a non-linear optimisation space , 1998 .

[81] Archana Ganapathi,et al. A case for machine learning to optimize multicore performance , 2009 .

[82] Michael F. P. O'Boyle,et al. Partitioning streaming parallelism for multi-cores: A machine learning based approach , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[83] Henry Massalin. Superoptimizer: a look at the smallest program , 1987, ASPLOS 1987.

[84] Santosh Pande,et al. Brainy: effective selection of data structures , 2011, PLDI '11.

[85] Danny Dig,et al. API code recommendation using statistical learning from fine-grained changes , 2016, SIGSOFT FSE.

[86] Dimitrios S. Nikolopoulos,et al. Online power-performance adaptation of multithreaded programs using hardware event-based prediction , 2006, ICS '06.

[87] Abid M. Malik. Spatial Based Feature Generation for Machine Learning Based Optimization Compilation , 2010, 2010 Ninth International Conference on Machine Learning and Applications.

[88] Michael F. P. O'Boyle,et al. Combined Selection of Tile Sizes and Unroll Factors Using Iterative Compilation , 2000, Proceedings 2000 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00622).

[89] E. Schmidt,et al. Lex—a lexical analyzer generator , 1990 .

[90] Yaniv David,et al. Tracelet-based code search in executables , 2014, PLDI.

[91] Chris Cummins,et al. End-to-End Deep Learning of Optimization Heuristics , 2017, 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[92] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[93] Bruno Ciciani,et al. Machine Learning-Based Self-Adjusting Concurrency in Software Transactional Memory Systems , 2012, 2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[94] Una-May O'Reilly,et al. Siblingrivalry: online autotuning through local competitions , 2012, CASES '12.

[95] John Cavazos,et al. Using graph-based program characterization for predictive modeling , 2012, CGO '12.

[96] Mark Newman,et al. Networks: An Introduction , 2010 .

[97] Ananta Tiwari,et al. Online Adaptive Code Generation and Tuning , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[98] Carl E. Rasmussen,et al. Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[99] I K Fodor,et al. A Survey of Dimension Reduction Techniques , 2002 .

[100] Jean-Luc Dekeyser,et al. A system level power consumption estimation for MPSoC , 2011, 2011 International Symposium on System on Chip (SoC).

[101] Sally A. McKee,et al. Comparing Scalability Prediction Strategies on an SMP of CMPs , 2010, Euro-Par.

[102] José A. B. Fortes,et al. On the Use of Machine Learning to Predict the Time and Resources Consumed by Applications , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[103] Brad Calder,et al. Automatically characterizing large scale program behavior , 2002, ASPLOS X.

[104] Michail G. Lagoudakis,et al. Algorithm Selection using Reinforcement Learning , 2000, ICML.

[105] Sally A. McKee,et al. Methods of inference and learning for performance modeling of parallel applications , 2007, PPoPP.

[106] Martin T. Vechev,et al. Probabilistic model for code with decision trees , 2016, OOPSLA.

[107] Alper Sen,et al. Using Machine Learning Techniques to Detect Parallel Patterns of Multi-threaded Applications , 2015, International Journal of Parallel Programming.

[108] Mark Stephenson,et al. Predicting unroll factors using supervised classification , 2005, International Symposium on Code Generation and Optimization.

[109] José Nelson Amaral,et al. To Inline or Not to Inline? Enhanced Inlining Decisions , 2003, LCPC.

[110] Michael A. Harrison,et al. Accurate static estimators for program optimization , 1994, PLDI '94.

[111] Zheng Wang,et al. Fast Automatic Heuristic Construction Using Active Learning , 2014, LCPC.

[112] Michael F. P. O'Boyle,et al. OpenCL Task Partitioning in the Presence of GPU Contention , 2013, LCPC.

[113] Michael F. P. O'Boyle,et al. Fast compiler optimisation evaluation using code-feature based performance prediction , 2007, CF '07.

[114] Jordi Torres,et al. Towards energy-aware scheduling in data centers using machine learning , 2010, e-Energy.

[115] Keshav Pingali,et al. A quantitative study of irregular programs on GPUs , 2012, 2012 IEEE International Symposium on Workload Characterization (IISWC).

[116] Andreas Krause,et al. Active Learning for Multi-Objective Optimization , 2013, ICML.

[117] Yi Yang,et al. A unified optimizing compiler framework for different GPGPU architectures , 2012, TACO.

[118] Lingjia Tang,et al. Treadmill: Attributing the Source of Tail Latency through Precise Load Testing and Statistical Inference , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[119] Michael F. P. O'Boyle,et al. Automatic optimization of thread-coarsening for graphics processors , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).

[120] Prasad A. Kulkarni,et al. Exploiting phase inter-dependencies for faster iterative compiler optimization phase order searches , 2013, 2013 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES).

[121] Michael F. P. O'Boyle,et al. Automatic Feature Generation for Machine Learning Based Optimizing Compilation , 2009, 2009 International Symposium on Code Generation and Optimization.

[122] Steven G. Johnson,et al. The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.

[123] Sharad Malik,et al. Power analysis of embedded software: a first step towards software power minimization , 1994, IEEE Trans. Very Large Scale Integr. Syst..

[124] Geoffrey E. Hinton,et al. Binary coding of speech spectrograms using a deep auto-encoder , 2010, INTERSPEECH.

[125] Kalyan Veeramachaneni,et al. Autotuning algorithmic choice for input sensitivity , 2015, PLDI.

[126] Alexandre C. B. Delbem,et al. Exploration of compiler optimization sequences using clustering-based selection , 2014, LCTES '14.

[127] Xipeng Shen,et al. A cross-input adaptive framework for GPU program optimizations , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[128] Michael F. P. O'Boyle,et al. Smart multi-task scheduling for OpenCL programs on CPU/GPU heterogeneous platforms , 2014, 2014 21st International Conference on High Performance Computing (HiPC).

[129] Manuela M. Veloso,et al. Learning to Predict Performance from Formula Modeling and Training Data , 2000, ICML.

[130] Feng Mao,et al. Exploiting statistical correlations for proactive prediction of program behaviors , 2010, CGO '10.

[131] Torsten Hoefler,et al. Using Compiler Techniques to Improve Automatic Performance Modeling , 2015, 2015 International Conference on Parallel Architecture and Compilation (PACT).

[132] Albert Cohen,et al. Predictive modeling in a polyhedral optimization space , 2011, CGO 2011.

[133] Shoaib Kamil,et al. OpenTuner: An extensible framework for program autotuning , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).

[134] Zheng Wang,et al. Adaptive optimization for OpenCL programs on embedded heterogeneous systems , 2017, LCTES.

[135] Lieven Eeckhout,et al. Performance prediction based on inherent program similarity , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[136] Ion Stoica,et al. Ernest: Efficient Performance Prediction for Large-Scale Advanced Analytics , 2016, NSDI.

[137] Onur Mutlu,et al. Self-Optimizing Memory Controllers: A Reinforcement Learning Approach , 2008, 2008 International Symposium on Computer Architecture.

[138] Lizy Kurian John,et al. Analysis of redundancy and application balance in the SPEC CPU2006 benchmark suite , 2007, ISCA '07.

[139] J. Chipps,et al. A mathematical language compiler , 1956, ACM '56.

[140] Xuan Chen,et al. Adaptive Multi-versioning for OpenMP Parallelization via Machine Learning , 2009, 2009 15th International Conference on Parallel and Distributed Systems.

[141] Michael F. P. O'Boyle,et al. Mapping parallelism to multi-cores: a machine learning based approach , 2009, PPoPP '09.

[142] Michael F. P. O'Boyle,et al. Portable mapping of data parallel programs to OpenCL for heterogeneous systems , 2013, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[143] Daniel Mossé,et al. Octopus-Man: QoS-driven task management for heterogeneous multicores in warehouse-scale computers , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[144] Martin White,et al. Deep learning code fragments for code clone detection , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[145] M. Douglas McIlroy. Macro instruction extensions of compiler languages , 1960, CACM.

[146] Vijay Janapa Reddi,et al. High-performance and energy-efficient mobile web browsing on big/little systems , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[147] Eric A. Brewer,et al. High-level optimization via automated statistical modeling , 1995, PPOPP '95.

[148] Rainer Leupers,et al. Frequency-Aware ESL Power Estimation for ARM Cortex-A9 Using a Black Box Processor Model , 2016, ACM Trans. Embed. Comput. Syst..

[149] Lieven Eeckhout,et al. Evaluating iterative optimization across 1000 datasets , 2010, PLDI '10.

[150] Michael F. P. O'Boyle,et al. Reducing Training Time in a One-Shot Machine Learning-Based Compiler , 2009, LCPC.

[151] Eran Yahav,et al. Statistical similarity of binaries , 2016, PLDI.

[152] Le Yi Wang,et al. VCONF: a reinforcement learning approach to virtual machines auto-configuration , 2009, ICAC '09.

[153] Prasanna Balaprakash,et al. Active-learning-based surrogate models for empirical performance tuning , 2013, 2013 IEEE International Conference on Cluster Computing (CLUSTER).

[154] K JohnLizy,et al. Analysis of redundancy and application balance in the SPEC CPU2006 benchmark suite , 2007 .

[155] Michael Voss,et al. High-level adaptive program optimization with ADAPT , 2001, PPoPP '01.

[156] Alan Edelman,et al. Language and compiler support for auto-tuning variable-accuracy algorithms , 2011, International Symposium on Code Generation and Optimization (CGO 2011).

[157] Albert Cohen,et al. Iterative optimization in the polyhedral model: part ii, multidimensional time , 2008, PLDI '08.

[158] Una-May O'Reilly,et al. Hyperparameter Tuning in Bandit-Based Adaptive Operator Selection , 2012, EvoApplications.

[159] Michael F. P. O'Boyle,et al. Celebrating diversity: a mixture of experts approach for runtime mapping in dynamic environments , 2015, PLDI.

[160] Charles A. Sutton,et al. Parameter-free probabilistic API mining across GitHub , 2015, SIGSOFT FSE.

[161] Barry Porter,et al. Improving Spark Application Throughput Via Memory Aware Task Co-location: A Mixture of Experts Approach , 2017 .

[162] Lieven Eeckhout,et al. Cole: compiler optimization level exploration , 2008, CGO '08.

[163] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[164] Lieven Eeckhout,et al. Workload design: selecting representative program-input pairs , 2002, Proceedings.International Conference on Parallel Architectures and Compilation Techniques.

[165] Brad Calder,et al. Using SimPoint for accurate and efficient simulation , 2003, SIGMETRICS '03.

[166] Michael F. P. O'Boyle,et al. Rapidly Selecting Good Compiler Optimizations using Performance Counters , 2007, International Symposium on Code Generation and Optimization (CGO'07).

[167] Sally A. McKee,et al. Identifying energy-efficient concurrency levels using machine learning , 2007, 2007 IEEE International Conference on Cluster Computing.

[168] Tao Wang,et al. Convolutional Neural Networks over Tree Structures for Programming Language Processing , 2014, AAAI.

[169] Manuela M. Veloso,et al. Learning to Construct Fast Signal Processing Implementations , 2002, J. Mach. Learn. Res..

[170] Peng Zhang,et al. Auto-tuning Streamed Applications on Intel Xeon Phi , 2018, 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[171] Alan Edelman,et al. Autotuning multigrid with PetaBricks , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[172] Christopher M. Bishop,et al. Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[173] Jack J. Dongarra,et al. Performance-Portable Autotuning of OpenCL Kernels for Convolutional Layers of Deep Neural Networks , 2016, 2016 2nd Workshop on Machine Learning in HPC Environments (MLHPC).

[174] Michael F. P. O'Boyle,et al. Automatic Tuning of Inlining Heuristics , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[175] Michael F. P. O'Boyle,et al. Using machine learning to focus iterative optimization , 2006, International Symposium on Code Generation and Optimization (CGO'06).

[176] Barton P. Miller,et al. Extracting compiler provenance from program binaries , 2010, PASTE '10.

[177] Wei Wang,et al. ReQoS: reactive static/dynamic compilation for QoS in warehouse scale computers , 2013, ASPLOS '13.

[178] Y. N. Srikant,et al. Microarchitecture Sensitive Empirical Models for Compiler Optimizations , 2007, International Symposium on Code Generation and Optimization (CGO'07).

[179] Peter B. Sheridan,et al. The arithmetic translator-compiler of the IBM FORTRAN automatic coding system , 1959, CACM.

[180] Albert Cohen,et al. Practical aggregation of semantical program properties for machine learning based optimization , 2010, CASES '10.

[181] Salman Khan,et al. Using PredictiveModeling for Cross-Program Design Space Exploration in Multicore Systems , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).

[182] Keith D. Cooper,et al. Adaptive Optimizing Compilers for the 21st Century , 2002, The Journal of Supercomputing.

[183] Murray Hill,et al. Yacc: Yet Another Compiler-Compiler , 1978 .

[184] James Demmel,et al. Benchmarking GPUs to tune dense linear algebra , 2008, HiPC 2008.

[185] Jack J. Dongarra,et al. Implementation and Tuning of Batched Cholesky Factorization and Solve for NVIDIA GPUs , 2016, IEEE Transactions on Parallel and Distributed Systems.

[186] Bruce R. Childers,et al. Building and using application utility models to dynamically choose thread counts , 2014, The Journal of Supercomputing.

[187] Roberto Santana,et al. Evolutionary Optimization of Compiler Flag Selection by Learning and Exploiting Flags Interactions , 2016, GECCO.

[188] Bronis R. de Supinski,et al. Prediction models for multi-dimensional power-performance optimization on many cores , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[189] Michael F. P. O'Boyle,et al. Automatic performance model construction for the fast software exploration of new hardware designs , 2006, CASES '06.

[190] Keith D. Cooper,et al. ACME: adaptive compilation made efficient , 2005, LCTES '05.

[191] Jean-François Méhaut,et al. A machine learning-based approach for thread mapping on transactional memory applications , 2011, 2011 18th International Conference on High Performance Computing.

[192] Pavlos Petoumenos,et al. Minimizing the cost of iterative compilation with active learning , 2017, 2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[193] Tomoyuki Hiroyasu,et al. SPEA2+: Improving the Performance of the Strength Pareto Evolutionary Algorithm 2 , 2004, PPSN.

[194] R. J. Adcock. A Problem in Least Squares , 1878 .