Customized Monte Carlo Tree Search for LLVM/Polly's Composable Loop Optimization Transformations
暂无分享,去创建一个
[1] Thierry Moreau,et al. Learning to Optimize Tensor Programs , 2018, NeurIPS.
[2] Haichen Shen,et al. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning , 2018, OSDI.
[3] Jonathan Ragan-Kelley,et al. Automatically scheduling halide image processing pipelines , 2016, ACM Trans. Graph..
[4] Ion Stoica,et al. NeuroVectorizer: end-to-end vectorization with deep reinforcement learning , 2020, CGO.
[5] Torsten Hoefler,et al. Polly-ACC Transparent compilation to heterogeneous hardware , 2016, ICS.
[6] Hadi Esmaeilzadeh,et al. Reinforcement Learning and Adaptive Sampling for Optimized DNN Compilation , 2019, ArXiv.
[7] Christian Lengauer,et al. Polly - Performing Polyhedral Optimizations on a Low-Level Intermediate Representation , 2012, Parallel Process. Lett..
[8] Helmar Burkhart,et al. PATUS: A Code Generation and Autotuning Framework for Parallel Iterative Stencil Computations on Modern Microarchitectures , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.
[9] Giuseppe Paolo Ernesto Toffanin Zingales. HalideTuner : generating and tuning halide schedules with Opentuner , 2015 .
[10] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.
[11] Frédo Durand,et al. Learning to optimize halide with tree search and random programs , 2019, ACM Trans. Graph..
[12] P. Sadayappan,et al. Using machine learning to improve automatic vectorization , 2012, TACO.
[13] John Wawrzynek,et al. ProTuner: Tuning Programs with Monte Carlo Tree Search , 2020, ArXiv.
[14] Charles L. Lawson,et al. Basic Linear Algebra Subprograms for Fortran Usage , 1979, TOMS.
[15] Hal Finkel,et al. User-Directed Loop-Transformations in Clang , 2018, 2018 IEEE/ACM 5th Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC).
[16] Simon M. Lucas,et al. A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.
[17] Hal Finkel,et al. Autotuning Search Space for Loop Transformations , 2020, 2020 IEEE/ACM 6th Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC) and Workshop on Hierarchical Parallelism for Exascale Computing (HiPar).
[18] H. Jaap van den Herik,et al. Investigations with Monte Carlo Tree Search for Finding Better Multivariate Horner Schemes , 2013, ICAART.
[19] J. Cavazos,et al. Partnership for Advanced Computing in Europe Performance Improvement in Kernels by Guiding Compiler Auto-Vectorization Heuristics , 2014 .
[20] Chun Chen,et al. A scalable auto-tuning framework for compiler optimization , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[21] Uday Bondhugula,et al. A practical automatic polyhedral parallelizer and locality optimizer , 2008, PLDI '08.
[22] Shoaib Kamil,et al. OpenTuner: An extensible framework for program autotuning , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).
[23] Michael F. P. O'Boyle,et al. Milepost GCC: Machine Learning Enabled Self-tuning Compiler , 2011, International Journal of Parallel Programming.
[24] Sven Verdoolaege,et al. isl: An Integer Set Library for the Polyhedral Model , 2010, ICMS.
[25] Prasanna Balaprakash,et al. Autotuning PolyBench Benchmarks with LLVM Clang/Polly Loop Optimization Pragmas Using Bayesian Optimization , 2020, 2020 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS).
[26] J. Tukey,et al. An algorithm for the machine calculation of complex Fourier series , 1965 .
[27] Uri Alon,et al. code2vec: learning distributed representations of code , 2018, Proc. ACM Program. Lang..
[28] Michel Steuwer,et al. LIFT: A functional data-parallel IR for high-performance GPU code generation , 2017, 2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[29] Frédo Durand,et al. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines , 2013, PLDI 2013.
[30] David A. Padua,et al. A Language for the Compact Representation of Multiple Program Versions , 2005, LCPC.
[31] Michael F. P. O'Boyle,et al. Portable mapping of data parallel programs to OpenCL for heterogeneous systems , 2013, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[32] Frédo Durand,et al. Decoupling algorithms from schedules for easy optimization of image processing pipelines , 2012, ACM Trans. Graph..
[33] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[34] Wei-Yin Loh,et al. Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..
[35] Albert Cohen,et al. On the Representation of Partially Specified Implementations and its Application to the Optimization of Linear Algebra Kernels on GPU , 2019, ArXiv.
[36] Prasanna Balaprakash,et al. Autotuning in High-Performance Computing Applications , 2018, Proceedings of the IEEE.
[37] Hal Finkel,et al. Design and Use of Loop-Transformation Pragmas , 2019, IWOMP.
[38] Chris Cummins,et al. End-to-End Deep Learning of Optimization Heuristics , 2017, 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT).
[39] Sergei Gorlatch,et al. ATF: A Generic Auto-Tuning Framework , 2017, 2017 IEEE 19th International Conference on High Performance Computing and Communications; IEEE 15th International Conference on Smart City; IEEE 3rd International Conference on Data Science and Systems (HPCC/SmartCity/DSS).
[40] Sergei Gorlatch,et al. High performance stencil code generation with Lift , 2018, CGO.