APR: A Novel Parallel Repacking Algorithm for Efficient GPGPU Parallel Code Transformation
暂无分享,去创建一个
Xin Chen | Xubin He | Yuxin Wang | He Guo | Weijun Xiao | Yulong Yu | Sihui Zhong
[1] Michael F. P. O'Boyle,et al. A large-scale cross-architecture evaluation of thread-coarsening , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[2] P. Sadayappan,et al. Optimal loop unrolling for GPGPU programs , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[3] Christoph W. Kessler,et al. SkePU: a multi-backend skeleton programming library for multi-GPU systems , 2010, HLPP '10.
[4] Henk Corporaal,et al. Introducing 'Bones': a parallelizing source-to-source compiler based on algorithmic skeletons , 2012, GPGPU-5.
[5] Scott B. Baden,et al. Accelerating a 3D Finite-Difference Earthquake Simulation with a C-to-CUDA Translator , 2012, Computing in Science & Engineering.
[6] Apan Qasem,et al. Automatic Restructuring of GPU Kernels for Exploiting Inter-thread Data Locality , 2012, CC.
[7] Long Chen,et al. Dynamic load balancing on single- and multi-GPU systems , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[8] Rudolf Eigenmann,et al. OpenMPC: Extended OpenMP Programming and Tuning for GPUs , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[9] Wen-mei W. Hwu,et al. CUDA-Lite: Reducing GPU Programming Complexity , 2008, LCPC.
[10] Tarek S. Abdelrahman,et al. hiCUDA: High-Level GPGPU Programming , 2011, IEEE Transactions on Parallel and Distributed Systems.
[11] Kevin Skadron,et al. Dymaxion: Optimizing memory access patterns for heterogeneous systems , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[12] Scott B. Baden,et al. Mint: realizing CUDA performance in 3D stencil methods with annotated C , 2011, ICS '11.
[13] Patrick Horain,et al. GpuCV: an opensource GPU-accelerated framework forimage processing and computer vision , 2008, ACM Multimedia.
[14] Wen-mei W. Hwu,et al. Program optimization carving for GPU computing , 2008, J. Parallel Distributed Comput..
[15] Chun Chen,et al. A Programming Language Interface to Describe Transformations and Code Generation , 2010, LCPC.
[16] Jacqueline Chame,et al. A script-based autotuning compiler system to generate high-performance CUDA code , 2013, TACO.
[17] Francky Catthoor,et al. Polyhedral parallel code generation for CUDA , 2013, TACO.
[18] J. Ramanujam,et al. Automatic C-to-CUDA Code Generation for Affine Programs , 2010, CC.
[19] Jack J. Purdum,et al. C programming guide , 1983 .
[20] Robert M. Farber,et al. CUDA Application Design and Development , 2011 .
[21] Xiaolong Wu,et al. Exploiting More Parallelism from Applications Having Generalized Reductions on GPU Architectures , 2010, 2010 10th IEEE International Conference on Computer and Information Technology.
[22] Uday Bondhugula,et al. A compiler framework for optimization of affine loop nests for gpgpus , 2008, ICS '08.
[23] Albert Cohen,et al. Putting Automatic Polyhedral Compilation for GPGPU to Work , 2011 .