AN5D: automated stencil framework for high-degree temporal blocking on GPUs
暂无分享,去创建一个
Mohamed Wahib | Kazuaki Matsumura | Hamid Reza Zohouri | Toshio Endo | Satoshi Matsuoka | Toshio Endo | S. Matsuoka | M. Wahib | H. Zohouri | Kazuaki Matsumura
[1] Pradeep Dubey,et al. 3.5-D Blocking Optimization for Stencil Computations on Modern CPUs and GPUs , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[2] P. Sadayappan,et al. On Optimizing Complex Stencils on GPUs , 2019, 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[3] Naoya Maruyama,et al. Optimizing Stencil Computations for NVIDIA Kepler GPUs , 2014 .
[4] Sanjay V. Rajopadhye,et al. Simple, Accurate, Analytical Time Modeling and Optimal Tile Size Selection for GPGPU Stencils , 2017, PPoPP.
[5] Satoshi Matsuoka,et al. High-Performance High-Order Stencil Computation on FPGAs Using OpenCL , 2018, 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).
[6] Sergei Gorlatch,et al. High performance stencil code generation with Lift , 2018, CGO.
[7] Uday Bondhugula,et al. Diamond Tiling: Tiling Techniques to Maximize Parallelism for Stencil Computations , 2017, IEEE Transactions on Parallel and Distributed Systems.
[8] Chao Yang,et al. 26 PFLOPS Stencil Computations for Atmospheric Modeling on Sunway TaihuLight , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[9] Francky Catthoor,et al. Polyhedral parallel code generation for CUDA , 2013, TACO.
[10] Bradley C. Kuszmaul,et al. The pochoir stencil compiler , 2011, SPAA '11.
[11] Albert Cohen,et al. The Promises of Hybrid Hexagonal/Classical Tiling for GPU , 2013 .
[12] P. Sadayappan,et al. Register optimizations for stencils on GPUs , 2018, PPoPP.
[13] Vinod Grover,et al. Forma: a DSL for image processing applications to target GPUs and multi-core CPUs , 2015, GPGPU@PPoPP.
[14] Nikolaus A. Adams,et al. 11 PFLOP/s simulations of cloud cavitation collapse , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[15] P. Sadayappan,et al. Domain-Specific Optimization and Generation of High-Performance GPU Code for Stencil Computations , 2018, Proceedings of the IEEE.
[16] Torsten Hoefler,et al. Designing scalable FPGA architectures using high-level synthesis , 2018, PPoPP.
[17] Alejandro Duran,et al. YASK—Yet Another Stencil Kernel: A Framework for HPC Stencil Code-Generation and Tuning , 2016, 2016 Sixth International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing (WOLFHPC).
[18] Samuel Williams,et al. Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.
[19] Satoshi Matsuoka,et al. An 80-Fold Speedup, 15.0 TFlops Full GPU Acceleration of Non-Hydrostatic Weather Model ASUCA Production Code , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[20] Albert Cohen,et al. Split tiling for GPUs: automatic parallelization using trapezoidal tiles , 2013, GPGPU@ASPLOS.
[21] Kevin Skadron,et al. Performance modeling and automatic ghost zone optimization for iterative stencil loops on GPUs , 2009, ICS.
[22] P. Sadayappan,et al. Effective resource management for enhancing performance of 2D and 3D stencils on GPUs , 2016, GPGPU@PPoPP.
[23] Satoshi Matsuoka,et al. Peta-scale phase-field simulation for dendritic solidification on the TSUBAME 2.0 supercomputer , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[24] Uday Bondhugula,et al. Effective automatic parallelization of stencil computations , 2007, PLDI '07.
[25] Albert Cohen,et al. The Relation Between Diamond Tiling and Hexagonal Tiling , 2014, Parallel Process. Lett..
[26] Sven Verdoolaege,et al. Polyhedral Extraction Tool , 2012 .
[27] Michael Wolfe,et al. More iteration space tiling , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).
[28] P. Sadayappan,et al. High-performance code generation for stencil computations on GPU architectures , 2012, ICS '12.
[29] Junichiro Makino,et al. Optimal Temporal Blocking for Stencil Computation , 2015, ICCS.
[30] Yannis Cotronis,et al. A Quantitative Performance Evaluation of Fast on-Chip Memories of GPUs , 2016, 2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP).
[31] François Irigoin,et al. Supernode partitioning , 1988, POPL '88.
[32] Albert Cohen,et al. Hybrid Hexagonal/Classical Tiling for GPUs , 2014, CGO '14.
[33] J. Ramanujam,et al. SDSLc: a multi-target domain-specific compiler for stencil computations , 2015, WOLFHPC@SC.
[34] Stephen John Turner,et al. Optimizing and Auto-Tuning Iterative Stencil Loops for GPUs with the In-Plane Method , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.
[35] Matt Martineau,et al. GPU-STREAM v2.0: Benchmarking the Achievable Memory Bandwidth of Many-Core Processors Across Diverse Parallel Programming Models , 2016, ISC Workshops.
[36] Samuel Williams,et al. Implicit and explicit optimizations for stencil computations , 2006, MSPC '06.
[37] Sven Verdoolaege,et al. isl: An Integer Set Library for the Polyhedral Model , 2010, ICMS.
[38] Jason Cong,et al. SODA: Stencil with Optimized Dataflow Architecture , 2018, 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).
[39] Satoshi Matsuoka,et al. Combined Spatial and Temporal Blocking for High-Performance Stencil Computation on FPGAs Using OpenCL , 2018, FPGA.