Domain-Specific Multi-Level IR Rewriting for GPU
暂无分享,去创建一个
Torsten Hoefler | Tobias Gysi | Tobias Grosser | Oliver Fuhrer | Tobias Wicky | Stephan Herhut | Oleksandr Zinenko | Eddie Davis | Christoph Müller | T. Hoefler | O. Zinenko | T. Grosser | Christoph Müller | O. Fuhrer | Tobias Gysi | Tobias Wicky | S. Herhut | Eddie Davis
[1] Albert Cohen,et al. Violated dependence analysis , 2006, ICS '06.
[2] M. Wegman,et al. Global value numbers and redundant computations , 1988, POPL '88.
[3] Jan Vitek,et al. Terra: a multi-stage language for high-performance computing , 2013, PLDI.
[4] Torsten Hoefler,et al. MODESTO: Data-centric Analytic Optimization of Complex Stencil Programs on Heterogeneous Architectures , 2015, ICS.
[5] Alejandro Duran,et al. YASK—Yet Another Stencil Kernel: A Framework for HPC Stencil Code-Generation and Tuning , 2016, 2016 Sixth International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing (WOLFHPC).
[6] Eike Hermann Müller,et al. LFRic: Meeting the challenges of scalability and performance portability in Weather and Climate models , 2018, J. Parallel Distributed Comput..
[7] Raja , 2019, La Generación sin Nombre. Una antología.
[8] Taylor Graham. Dawn , 2000 .
[9] M. Baldauf,et al. Operational Convective-Scale Numerical Weather Prediction with the COSMO Model: Description and Sensitivities , 2011 .
[10] Michel Steuwer,et al. LIFT: A functional data-parallel IR for high-performance GPU code generation , 2017, 2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[11] Uday Bondhugula,et al. MLIR: Scaling Compiler Infrastructure for Domain Specific Computation , 2021, 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[12] Vikram S. Adve,et al. LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..
[13] W. M. McKeeman,et al. Peephole optimization , 1965, CACM.
[14] Philipp Slusallek,et al. AnyDSL: a partial evaluation framework for programming high-performance libraries , 2018, Proc. ACM Program. Lang..
[15] Torsten Hoefler,et al. Dawn: a High-level Domain-Specific Language Compiler Toolchain for Weather and Climate Applications , 2020, Supercomput. Front. Innov..
[16] Robert Pincus,et al. The CLAW DSL: Abstractions for Performance Portable Weather and Climate Models , 2018, PASC.
[17] Mohamed Wahib,et al. Scalable Kernel Fusion for Memory-Bound GPU Applications , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.
[18] Marc Pouzet,et al. Optimization space pruning without regrets , 2017, CC.
[19] Albert Cohen,et al. Split tiling for GPUs: automatic parallelization using trapezoidal tiles , 2013, GPGPU@ASPLOS.
[20] G. McMechan. MIGRATION BY EXTRAPOLATION OF TIME‐DEPENDENT BOUNDARY VALUES* , 1983 .
[21] Mary W. Hall,et al. Exploiting reuse and vectorization in blocked stencil computations on CPUs and GPUs , 2019, SC.
[22] Tobias Gysi,et al. Towards a performance portable, architecture agnostic implementation strategy for weather and climate models , 2014, Supercomput. Front. Innov..
[23] Kevin Skadron,et al. Scalable parallel programming , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).
[24] P. Sadayappan,et al. Domain-Specific Optimization and Generation of High-Performance GPU Code for Stencil Computations , 2018, Proceedings of the IEEE.
[25] Hal Finkel,et al. User-Directed Loop-Transformations in Clang , 2018, 2018 IEEE/ACM 5th Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC).
[26] Daniel J. Quinlan. ROSE: Compiler Support for Object-Oriented Frameworks , 2000, Parallel Process. Lett..
[27] H. Carter Edwards,et al. Kokkos: Enabling Performance Portability Across Manycore Architectures , 2013, 2013 Extreme Scaling Workshop (xsw 2013).
[28] Bradley C. Kuszmaul,et al. The pochoir stencil compiler , 2011, SPAA '11.
[29] P. Sadayappan,et al. Effective resource management for enhancing performance of 2D and 3D stencils on GPUs , 2016, GPGPU@PPoPP.
[30] Scott B. Baden,et al. Panda: A Compiler Framework for Concurrent CPU+\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$+$$\end{document}GPU Ex , 2016, International Journal of Parallel Programming.
[31] Kunle Olukotun,et al. Delite , 2014, ACM Trans. Embed. Comput. Syst..
[32] Shian-Jiann Lin,et al. A Two-Way Nested Global-Regional Dynamical Core on the Cubed-Sphere Grid , 2013 .
[33] Mohamed Wahib,et al. AN5D: automated stencil framework for high-degree temporal blocking on GPUs , 2020, CGO.
[34] Frédo Durand,et al. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines , 2013, PLDI 2013.
[35] Francky Catthoor,et al. Polyhedral parallel code generation for CUDA , 2013, TACO.
[36] Haichen Shen,et al. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning , 2018, OSDI.
[37] Sergei Gorlatch,et al. High performance stencil code generation with Lift , 2018, CGO.
[38] Steven S. Muchnick,et al. Advanced Compiler Design and Implementation , 1997 .
[39] Pradeep Dubey,et al. 3.5-D Blocking Optimization for Stencil Computations on Modern CPUs and GPUs , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[40] Naoya Maruyama,et al. Optimizing Stencil Computations for NVIDIA Kepler GPUs , 2014 .
[41] Tobias Gysi,et al. STELLA: a domain-specific tool for structured grid methods in weather and climate models , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.
[42] Elnar Hajiyev,et al. PENCIL: A Platform-Neutral Compute Intermediate Language for Accelerator Programming , 2015, 2015 International Conference on Parallel Architecture and Compilation (PACT).
[43] Christian Lengauer,et al. Polly - Performing Polyhedral Optimizations on a Low-Level Intermediate Representation , 2012, Parallel Process. Lett..
[44] Frédo Durand,et al. Learning to optimize halide with tree search and random programs , 2019, ACM Trans. Graph..
[45] P. Sadayappan,et al. Register optimizations for stencils on GPUs , 2018, PPoPP.
[46] P. Sadayappan,et al. High-performance code generation for stencil computations on GPU architectures , 2012, ICS '12.
[47] Torsten Hoefler,et al. Stateful dataflow multigraphs: a data-centric model for performance portability on heterogeneous architectures , 2019, SC.
[48] Torsten Hoefler,et al. Polly-ACC Transparent compilation to heterogeneous hardware , 2016, ICS.
[49] Torsten Hoefler,et al. Absinthe: Learning an Analytical Performance Model to Fuse and Tile Stencil Codes in One Shot , 2019, 2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT).
[50] Albert Cohen,et al. Hybrid Hexagonal/Classical Tiling for GPUs , 2014, CGO '14.
[51] Martin Odersky,et al. Lightweight modular staging: a pragmatic approach to runtime code generation and compiled DSLs , 2010, GPCE '10.
[52] Vivek Sarkar,et al. Modeling the conflicting demands of parallelism and Temporal/Spatial locality in affine scheduling , 2018, CC.
[53] J. Ramanujam,et al. SDSLc: a multi-target domain-specific compiler for stencil computations , 2015, WOLFHPC@SC.
[54] Alexandros Nikolaos Ziogas,et al. A data-centric approach to extreme-scale ab initio dissipative quantum transport simulations , 2019, SC.
[55] Uday Bondhugula,et al. MLIR: A Compiler Infrastructure for the End of Moore's Law , 2020, ArXiv.
[56] Takayuki Aoki,et al. Hybrid Fortran: High Productivity GPU Porting Framework Applied to Japanese Weather Prediction Model , 2017, WACCPD@SC.
[57] Uday Bondhugula,et al. PolyMage: Automatic Optimization for Image Processing Pipelines , 2015, ASPLOS.