Encapsulated Synchronization and Load-Balance in Heterogeneous Programming
暂无分享,去创建一个
[1] Wenguang Chen,et al. MapCG: Writing parallel program portable between CPU and GPU , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).
[2] George Karypis,et al. Introduction to Parallel Computing Solution Manual , 2003 .
[3] Arturo González-Escribano,et al. Using Fermi Architecture Knowledge to Speed up CUDA and OpenCL Programs , 2012, 2012 IEEE 10th International Symposium on Parallel and Distributed Processing with Applications.
[4] Arturo González-Escribano,et al. Effortless and Efficient Distributed Data-Partitioning in Linear Algebra , 2010, 2010 IEEE 12th International Conference on High Performance Computing and Communications (HPCC).
[5] Arturo González-Escribano,et al. Automatic Data Partitioning Applied to Multigrid PDE Solvers , 2011, 2011 19th International Euromicro Conference on Parallel, Distributed and Network-Based Processing.
[6] Qing-kui Chen,et al. A Stream Processor Cluster Architecture Model with the Hybrid Technology of MPI and CUDA , 2009, 2009 First International Conference on Information Science and Engineering.
[7] John E. Stone,et al. An asymmetric distributed shared memory model for heterogeneous parallel systems , 2010, ASPLOS XV.
[8] D. N. Ranasinghe,et al. Accelerating high performance applications with CUDA and MPI , 2009, 2009 International Conference on Industrial and Information Systems (ICIIS).
[9] Wen-mei W. Hwu,et al. MCUDA: An Efficient Implementation of CUDA Kernels for Multi-core CPUs , 2008, LCPC.
[10] Steven J. Deitz,et al. User-defined distributions and layouts in chapel: philosophy and framework , 2010 .
[11] Satnam Singh. Computing without Processors , 2011, ACM Queue.
[12] Robert A. van de Geijn,et al. Solving dense linear systems on platforms with multiple hardware accelerators , 2009, PPoPP '09.
[13] Hyesoon Kim,et al. Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[14] Karsten Schwan,et al. A framework for dynamically instrumenting GPU compute applications within GPU Ocelot , 2011, GPGPU-4.
[15] Ping Yao,et al. CuHMMer: A load-balanced CPU-GPU cooperative bioinformatics application , 2010, 2010 International Conference on High Performance Computing & Simulation.