BigKernel -- High Performance CPU-GPU Communication Pipelining for Big Data-Style Applications
暂无分享,去创建一个
[1] Joel H. Saltz,et al. Communication Optimizations for Irregular Scientific Computations on Distributed Memory Architectures , 1994, J. Parallel Distributed Comput..
[2] Yi Yang,et al. A GPGPU compiler for memory optimization and parallelism management , 2010, PLDI '10.
[3] Weng-Fai Wong,et al. Scalable framework for mapping streaming applications onto multi-GPU systems , 2012, PPoPP '12.
[4] Tarek S. Abdelrahman,et al. hiCUDA: a high-level directive-based language for GPU programming , 2009, GPGPU-2.
[5] John E. Stone,et al. An asymmetric distributed shared memory model for heterogeneous parallel systems , 2010, ASPLOS XV.
[6] Rudolf Eigenmann,et al. OpenMP to GPGPU: a compiler framework for automatic translation and optimization , 2009, PPoPP '09.
[7] R. Govindarajan,et al. Fast and efficient automatic memory management for GPUs using compiler-assisted runtime coherence scheme , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).
[8] Xipeng Shen,et al. On-the-fly elimination of dynamic irregularities for GPU computing , 2011, ASPLOS XVI.
[9] Hiroshi Nakamura,et al. Communication Library to Overlap Computation and Communication for OpenCL Application , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.
[10] Copyright © Intel Corporation 2008 * Other names and brands may be claimed as the property of others , 2004 .
[11] Joel H. Saltz,et al. Run-time and compile-time support for adaptive irregular problems , 1994, Proceedings of Supercomputing '94.
[12] David I. August,et al. Automatic CPU-GPU communication management and optimization , 2011, PLDI '11.
[13] Claire Cardie,et al. OpinionFinder: A System for Subjectivity Analysis , 2005, HLT.
[14] Feng Liu,et al. Dynamically managed data for CPU-GPU architectures , 2012, CGO '12.
[15] Vivek Sarkar,et al. JCUDA: A Programmer-Friendly Interface for Accelerating Java Programs with CUDA , 2009, Euro-Par.
[16] Kim M. Hazelwood,et al. Where is the data? Why you cannot debate CPU vs. GPU performance without the answer , 2011, (IEEE ISPASS) IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE.
[17] Wen-mei W. Hwu,et al. CUDA-Lite: Reducing GPU Programming Complexity , 2008, LCPC.
[18] Brucek Khailany,et al. CudaDMA: Optimizing GPU memory bandwidth via warp specialization , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[19] Isaac Y. Ho,et al. Meraculous: De Novo Genome Assembly with Short Paired-End Reads , 2011, PloS one.
[20] Weng-Fai Wong,et al. Automated Architecture-Aware Mapping of Streaming Applications Onto GPUs , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.