COMP: Compiler Optimizations for Manycore Processors
暂无分享,去创建一个
Yi Yang | Min Feng | Srimat T. Chakradhar | Linhai Song | Nishkam Ravi | S. Chakradhar | N. Ravi | Yi Yang | Linhai Song | Min Feng
[1] Manuel M. T. Chakravarty,et al. Accelerating Haskell array codes with multicore GPUs , 2011, DAMP '11.
[2] Rudolf Eigenmann,et al. OpenMPC: Extended OpenMP Programming and Tuning for GPUs , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[3] Yi Yang,et al. A GPGPU compiler for memory optimization and parallelism management , 2010, PLDI '10.
[4] Rudolf Eigenmann,et al. OpenMP to GPGPU: a compiler framework for automatic translation and optimization , 2009, PPoPP '09.
[5] Uday Bondhugula,et al. A compiler framework for optimization of affine loop nests for gpgpus , 2008, ICS '08.
[6] David H. Bailey,et al. The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..
[7] G. Amdhal,et al. Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).
[8] Christoforos E. Kozyrakis,et al. Evaluating MapReduce for Multi-core and Multiprocessor Systems , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.
[9] Satoshi Matsuoka,et al. CUDA vs OpenACC: Performance Case Studies with Kernel Benchmarks and a Memory-Bound CFD Application , 2013, 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing.
[10] Xipeng Shen,et al. On-the-fly elimination of dynamic irregularities for GPU computing , 2011, ASPLOS XVI.
[11] Christian Bienia,et al. Benchmarking modern multiprocessors , 2011 .
[12] Henry G. Dietz,et al. Reduction of Cache Coherence Overhead by Compiler Data Layout and Loop Transformation , 1991, LCPC.
[13] Yi Yang,et al. Apricot: an optimizing compiler and productivity tool for x86-compatible many-core coprocessors , 2012, ICS '12.
[14] Mithuna Thottethodi,et al. Nonlinear array layouts for hierarchical memory systems , 1999, ICS '99.
[15] Kim M. Hazelwood,et al. Where is the data? Why you cannot debate CPU vs. GPU performance without the answer , 2011, (IEEE ISPASS) IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE.
[16] Bradley C. Kuszmaul,et al. Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.
[17] Avi Mendelson,et al. Programming model for a heterogeneous x86 platform , 2009, PLDI '09.
[18] Feng Liu,et al. Dynamically managed data for CPU-GPU architectures , 2012, CGO '12.
[19] R. Govindarajan,et al. Fast and efficient automatic memory management for GPUs using compiler-assisted runtime coherence scheme , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).
[20] John E. Stone,et al. An asymmetric distributed shared memory model for heterogeneous parallel systems , 2010, ASPLOS XV.
[21] Mahmut T. Kandemir,et al. Optimizing Data Layouts for Parallel Computation on Multicores , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.
[22] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).