The Implementation of a High Performance GPGPU Compiler
暂无分享,去创建一个
[1] Richard W. Vuduc,et al. Many-Thread Aware Prefetching Mechanisms for GPGPU Applications , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.
[2] Sam S. Stone,et al. MCUDA: An Efficient Implementation of CUDA Kernels on Multi-cores , 2011 .
[3] Rudolf Eigenmann,et al. Cetus - An Extensible Compiler Infrastructure for Source-to-Source Transformation , 2003, LCPC.
[4] Rudolf Eigenmann,et al. OpenMP to GPGPU: a compiler framework for automatic translation and optimization , 2009, PPoPP '09.
[5] Wen-mei W. Hwu,et al. CUDA-Lite: Reducing GPU Programming Complexity , 2008, LCPC.
[6] Yi Yang,et al. A unified optimizing compiler framework for different GPGPU architectures , 2012, TACO.
[7] Wen-mei W. Hwu,et al. MCUDA: An Efficient Implementation of CUDA Kernels for Multi-core CPUs , 2008, LCPC.
[8] Wen-mei W. Hwu,et al. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA , 2008, PPoPP.
[9] William Gropp,et al. An adaptive performance modeling tool for GPU architectures , 2010, PPoPP '10.
[10] Yi Yang,et al. A GPGPU compiler for memory optimization and parallelism management , 2010, PLDI '10.
[11] Xipeng Shen,et al. A cross-input adaptive framework for GPU program optimizations , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[12] Wen-mei W. Hwu,et al. Program optimization space pruning for a multithreaded gpu , 2008, CGO '08.