Performance portable GPU code generation for matrix multiplication
暂无分享,去创建一个
Michel Steuwer | Christophe Dubach | Thibaut Lutz | Toomas Remmelg | Michel Steuwer | Christophe Dubach | Toomas Remmelg | Thibaut Lutz
[1] Timothy G. Mattson,et al. OpenCL Programming Guide , 2011 .
[2] Sebastian Hack,et al. A graph-based higher-order intermediate representation , 2015, 2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[3] Trevor L. McDonell. Optimising purely functional GPU programs , 2013, ICFP.
[4] Elnar Hajiyev,et al. PENCIL: A Platform-Neutral Compute Intermediate Language for Accelerator Programming , 2015, 2015 International Conference on Parallel Architecture and Compilation (PACT).
[5] Alan Edelman,et al. PetaBricks: a language and compiler for algorithmic choice , 2009, PLDI '09.
[6] Jack J. Dongarra,et al. Autotuning GEMM Kernels for the Fermi GPU , 2012, IEEE Transactions on Parallel and Distributed Systems.
[7] J. Ramanujam,et al. Automatic C-to-CUDA Code Generation for Affine Programs , 2010, CC.
[8] Edward G. Coffman,et al. Organizing matrices and matrix operations for paged memory systems , 1969, Commun. ACM.
[9] Sebastian Hack,et al. Whole-function vectorization , 2011, International Symposium on Code Generation and Optimization (CGO 2011).
[10] Kunle Olukotun,et al. Locality-Aware Mapping of Nested Parallel Patterns on GPUs , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[11] H. Corporaal,et al. Bones , 2014, ACM Trans. Archit. Code Optim..
[12] Saman P. Amarasinghe,et al. Portable performance on heterogeneous architectures , 2013, ASPLOS '13.
[13] Stanislav G. Sedukhin,et al. Performance Tuning of Matrix Multiplication in OpenCL on Different GPUs and CPUs , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.
[14] Kunle Olukotun,et al. Delite , 2014, ACM Trans. Embed. Comput. Syst..
[15] Jack Dongarra,et al. clMAGMA: high performance dense linear algebra with OpenCL , 2014, IWOCL '14.
[16] André Seznec,et al. Performance upper bound analysis and optimization of SGEMM on Fermi and Kepler GPUs , 2013, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[17] Francky Catthoor,et al. Polyhedral parallel code generation for CUDA , 2013, TACO.
[18] David F. Bacon,et al. Compiling a high-level language for GPUs: (via language support for architectures and compilers) , 2012, PLDI.
[19] Sam Lindley,et al. Generating performance portable code using rewrite rules: from high-level functional expressions to high-performance OpenCL code , 2015, ICFP.
[20] Sergei Gorlatch,et al. SkelCL - A Portable Skeleton Library for High-Level GPU Programming , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.