Modeling the Performance of 2.5D Blocking of 3D Stencil Code on GPUs
暂无分享,去创建一个
[1] Pradeep Dubey,et al. 3.5-D Blocking Optimization for Stencil Computations on Modern CPUs and GPUs , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[2] Naoya Maruyama,et al. Optimizing Stencil Computations for NVIDIA Kepler GPUs , 2014 .
[3] Marcin Dabrowski,et al. Efficient 3D stencil computations using CUDA , 2013, Parallel Comput..
[4] Xing Cai,et al. An analytical GPU performance model for 3D stencil computations from the angle of data traffic , 2015, The Journal of Supercomputing.
[5] Alok Aggarwal,et al. The input/output complexity of sorting and related problems , 1988, CACM.
[6] P. Sadayappan,et al. Characterizing and enhancing global memory data coalescing on GPUs , 2015, 2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[7] Andreas Resios. GPU performance prediction using parametrized models , 2011 .
[8] Cosmin Nita,et al. Optimized three-dimensional stencil computation on Fermi and Kepler GPUs , 2014, 2014 IEEE High Performance Extreme Computing Conference (HPEC).
[9] Xinxin Mei,et al. Dissecting GPU Memory Hierarchy Through Microbenchmarking , 2015, IEEE Transactions on Parallel and Distributed Systems.
[10] Samuel Williams,et al. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[11] Henk Corporaal,et al. Demystifying the 16 × 16 thread‐block for stencils on the GPU , 2015, Concurr. Comput. Pract. Exp..
[12] Wen-mei W. Hwu,et al. Analytical Performance Prediction for Evaluation and Tuning of GPGPU Applications , 2009 .