Achieving a single compute device image in OpenCL for multiple GPUs
暂无分享,去创建一个
Jungwon Kim | Jaejin Lee | Honggyu Kim | Joo Hwan Lee | Joo Hwan Lee | Jaejin Lee | Jungwon Kim | Honggyu Kim | Joo Hwan Lee
[1] Jong-Deok Choi,et al. An OpenCL framework for heterogeneous multicores with local memory , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).
[2] Kai Lu,et al. Adaptive Optimization for Petascale Heterogeneous CPU/GPU Computing , 2010, 2010 IEEE International Conference on Cluster Computing.
[3] Bixia Zheng,et al. Twin Peaks: A Software Platform for Heterogeneous Computing on General-Purpose and Graphics Processors , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).
[4] Klaus Schulten,et al. Adapting a message-driven parallel application to GPU-accelerated clusters , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[5] Steven S. Muchnick,et al. Advanced Compiler Design and Implementation , 1997 .
[6] G. Amdhal,et al. Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).
[7] Jie Cheng,et al. Programming Massively Parallel Processors. A Hands-on Approach , 2010, Scalable Comput. Pract. Exp..
[8] Kai Li,et al. The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[9] David W. Binkley,et al. Program slicing , 2008, 2008 Frontiers of Software Maintenance.
[10] James Demmel,et al. Benchmarking GPUs to tune dense linear algebra , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[11] David R. Kaeli,et al. Exploring the multiple-GPU design space , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[12] Frank Tip,et al. A survey of program slicing techniques , 1994, J. Program. Lang..
[13] Frederica Darema,et al. The SPMD Model : Past, Present and Future , 2001, PVM/MPI.
[14] Thomas Ertl,et al. CUDASA: Compute Unified Device and Systems Architecture , 2008, EGPGV@Eurographics.
[15] Robert A. van de Geijn,et al. Solving dense linear systems on platforms with multiple hardware accelerators , 2009, PPoPP '09.
[16] Steven S. Lumetta,et al. CUBA: an architecture for efficient CPU/co-processor data communication , 2008, ICS '08.
[17] Rob van Nieuwpoort,et al. The LOFAR correlator: implementation and performance analysis , 2010, PPoPP '10.
[18] Vikram S. Adve,et al. LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..
[19] Yi Yang,et al. A GPGPU compiler for memory optimization and parallelism management , 2010, PLDI '10.