Compiler-Driven Data Layout Transformation for Heterogeneous Platforms
暂无分享,去创建一个
[1] David F. Bacon,et al. Compiling a high-level language for GPUs: (via language support for architectures and compilers) , 2012, PLDI.
[2] Bo Wu,et al. Complexity analysis and algorithm design for reorganizing data to minimize non-coalesced memory accesses on GPU , 2013, PPoPP '13.
[3] Wen-mei W. Hwu,et al. Data Layout Transformation Exploiting Memory-Level Parallelism in Structured Grid Many-Core Applications , 2010, International Journal of Parallel Programming.
[4] Vivek Sarkar,et al. Habanero-Java: the new adventures of old X10 , 2011, PPPJ.
[5] James R. Larus,et al. SIMD parallelization of applications that traverse irregular data structures , 2013, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[6] Yao Zhang,et al. Improving Performance Portability in OpenCL Programs , 2013, ISC.
[7] Kevin Skadron,et al. Dymaxion: Optimizing memory access patterns for heterogeneous systems , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[8] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[9] Mahmut T. Kandemir,et al. Data layout optimization for GPGPU architectures , 2013, PPoPP '13.
[10] Wen-mei W. Hwu,et al. DL: A data layout transformation system for heterogeneous computing , 2012, 2012 Innovative Parallel Computing (InPar).
[11] Vivek Sarkar,et al. Integrating Asynchronous Task Parallelism with MPI , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.
[12] Scott B. Baden,et al. Mint: realizing CUDA performance in 3D stencil methods with annotated C , 2011, ICS '11.
[13] Vijay Saraswat,et al. GPU programming in a high level language: compiling X10 to CUDA , 2011, X10 '11.