An analytical GPU performance model for 3D stencil computations from the angle of data traffic
暂无分享,去创建一个
[1] Mauricio Araya-Polo,et al. Algorithm 942 , 2014 .
[2] Richard W. Vuduc,et al. A performance analysis framework for identifying potential benefits in GPGPU applications , 2012, PPoPP '12.
[3] Samuel Williams,et al. Implicit and explicit optimizations for stencil computations , 2006, MSPC '06.
[4] William Gropp,et al. An adaptive performance modeling tool for GPU architectures , 2010, PPoPP '10.
[5] Samuel Williams,et al. Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.
[6] Kevin Skadron,et al. Performance modeling and automatic ghost zone optimization for iterative stencil loops on GPUs , 2009, ICS.
[7] Samuel Williams,et al. An auto-tuning framework for parallel multicore stencil computations , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[8] Apan Qasem,et al. Understanding stencil code performance on multicore architectures , 2011, CF '11.
[9] P. Sadayappan,et al. High-performance code generation for stencil computations on GPU architectures , 2012, ICS '12.
[10] Henry Wong,et al. Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[11] Nan Wu,et al. On the GPU-CPU Performance Portability of OpenCL for 3D Stencil Computations , 2013, 2013 International Conference on Parallel and Distributed Systems.
[12] Frank Mueller,et al. Auto-generation and auto-tuning of 3D stencil codes on GPU clusters , 2012, CGO '12.
[13] Scott B. Baden,et al. Mint: realizing CUDA performance in 3D stencil methods with annotated C , 2011, ICS '11.
[14] Paulius Micikevicius,et al. 3D finite difference computation on GPUs using CUDA , 2009, GPGPU-2.
[15] Leonid Oliker,et al. Impact of modern memory subsystems on cache optimizations for stencil computations , 2005, MSP '05.
[16] Mauricio Araya-Polo,et al. Modeling Stencil Computations on Modern HPC Architectures , 2014, PMBS@SC.
[17] Gerhard Wellein,et al. Quantifying Performance Bottlenecks of Stencil Computations Using the Execution-Cache-Memory Model , 2014, ICS.
[18] Nan Wu,et al. On the GPU Performance of 3D Stencil Computations Implemented in OpenCL , 2013, ISC.
[19] Samuel Williams,et al. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[20] William J. Dally,et al. The GPU Computing Era , 2010, IEEE Micro.
[21] Henk Corporaal,et al. A detailed GPU cache model based on reuse distance theory , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).
[22] Samuel Williams,et al. Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors , 2007, SIAM Rev..
[23] Dietmar Fey,et al. High Performance Stencil Code Algorithms for GPGPUs , 2011, ICCS.
[24] Hyesoon Kim,et al. An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness , 2009, ISCA '09.