A unified optimizing compiler framework for different GPGPU architectures
暂无分享,去创建一个
Yi Yang | Huiyang Zhou | Ping Xiang | Jingfei Kong | Mike Mantor | J. Kong | Yi Yang | Mike Mantor | Huiyang Zhou | Ping Xiang
[1] Jack Dongarra,et al. An Improved MAGMA GEMM for Fermi GPUs , 2010 .
[2] Wen-mei W. Hwu,et al. MCUDA: An Efficient Implementation of CUDA Kernels for Multi-core CPUs , 2008, LCPC.
[3] James Demmel,et al. Benchmarking GPUs to tune dense linear algebra , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[4] Xipeng Shen,et al. A cross-input adaptive framework for GPU program optimizations , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[5] Albert Cohen,et al. Iterative Optimization in the Polyhedral Model: Part I, One-Dimensional Time , 2007, International Symposium on Code Generation and Optimization (CGO'07).
[6] Wen-mei W. Hwu,et al. CUDA-Lite: Reducing GPU Programming Complexity , 2008, LCPC.
[7] Wen-mei W. Hwu,et al. Program optimization space pruning for a multithreaded gpu , 2008, CGO '08.
[8] YangYi,et al. A unified optimizing compiler framework for different GPGPU architectures , 2012 .
[9] James Demmel,et al. Benchmarking GPUs to tune dense linear algebra , 2008, HiPC 2008.
[10] Yi Yang,et al. A GPGPU compiler for memory optimization and parallelism management , 2010, PLDI '10.
[11] Uday Bondhugula,et al. A compiler framework for optimization of affine loop nests for gpgpus , 2008, ICS '08.
[12] Rudolf Eigenmann,et al. OpenMP to GPGPU: a compiler framework for automatic translation and optimization , 2009, PPoPP '09.
[13] J. Tukey,et al. An algorithm for the machine calculation of complex Fourier series , 1965 .
[14] Wen-mei W. Hwu,et al. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA , 2008, PPoPP.
[15] Alfred V. Aho,et al. Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.
[16] Michael F. P. O'Boyle,et al. Using machine learning to focus iterative optimization , 2006, International Symposium on Code Generation and Optimization (CGO'06).
[17] N. Fujimoto,et al. Faster matrix-vector multiplication on GeForce 8800GTX , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[18] Jack Dongarra,et al. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis , 2014, HiPC 2014.
[19] Naga K. Govindaraju,et al. High performance discrete Fourier transforms on graphics processors , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[20] Hyesoon Kim,et al. An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness , 2009, ISCA '09.
[21] William Gropp,et al. An adaptive performance modeling tool for GPU architectures , 2010, PPoPP '10.
[22] Rudolf Eigenmann,et al. Cetus - An Extensible Compiler Infrastructure for Source-to-Source Transformation , 2003, LCPC.