A Pattern Specification and Optimizations Framework for Accelerating Scientific Computations on Heterogeneous Clusters
暂无分享,去创建一个
[1] Gagan Agrawal,et al. Optimizing MapReduce for GPUs with effective shared memory usage , 2012, HPDC '12.
[2] Cédric Augonnet,et al. StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..
[3] Siegfried Benkner,et al. Using explicit platform descriptions to support programming of heterogeneous many-core systems , 2012, Parallel Comput..
[4] Hyesoon Kim,et al. Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[5] Sunita Chandrasekaran,et al. Exploring Programming Multi-GPUs Using OpenMP and OpenACC-Based Hybrid Model , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.
[6] Alejandro Duran,et al. Productive Programming of GPU Clusters with OmpSs , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.
[7] Gagan Agrawal,et al. Accelerating MapReduce on a coupled CPU-GPU architecture , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[8] Scott B. Baden,et al. Mint: realizing CUDA performance in 3D stencil methods with annotated C , 2011, ICS '11.
[9] Robert J. Harrison,et al. Global Arrays: a portable "shared-memory" programming model for distributed memory computers , 1994, Proceedings of Supercomputing '94.
[10] Bruno Raffin,et al. X-kaapi: A Multi Paradigm Runtime for Multicore Architectures , 2013, 2013 42nd International Conference on Parallel Processing.
[11] Eduard Ayguadé,et al. Programmability and portability for exascale: Top down programming methodology and tools with StarSs , 2013, J. Comput. Sci..
[12] Thomas Fahringer,et al. LibWater: heterogeneous distributed computing made easy , 2013, ICS '13.
[13] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[14] Jungwon Kim,et al. A SnuCL implementation of the LINPACK benchmark on clusters with multi-GPU nodes , 2012, HiPC 2012.
[15] David A. Padua,et al. Performance Portability with the Chapel Language , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.
[16] George Almási,et al. Scalable RDMA performance in PGAS languages , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[17] Vijay Saraswat,et al. GPU programming in a high level language: compiling X10 to CUDA , 2011, X10 '11.
[18] Eric Darve,et al. Liszt: A domain specific language for building portable mesh-based PDE solvers , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[19] Frank Mueller,et al. Auto-generation and auto-tuning of 3D stencil codes on GPU clusters , 2012, CGO '12.
[20] G. R. Mudalige,et al. OP2: An active library framework for solving unstructured mesh-based applications on multi-core and many-core architectures , 2012, 2012 Innovative Parallel Computing (InPar).
[21] Vivek Sarkar,et al. X10: an object-oriented approach to non-uniform cluster computing , 2005, OOPSLA '05.
[22] Yi Wang,et al. SciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data Formats , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).
[23] Samuel Williams,et al. The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .
[24] Jungwon Kim,et al. SnuCL: an OpenCL framework for heterogeneous CPU/GPU clusters , 2012, ICS '12.
[25] L. Dagum,et al. OpenMP: an industry standard API for shared-memory programming , 1998 .
[26] Jing Zhang,et al. OpenCL and the 13 dwarfs: a work in progress , 2012, ICPE '12.
[27] Satoshi Matsuoka,et al. Physis: An implicitly parallel programming model for stencil computations on large-scale GPU-accelerated supercomputers , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[28] Mitsuhisa Sato,et al. Multiple-SPMD Programming Environment Based on PGAS and Workflow toward Post-petascale Computing , 2013, 2013 42nd International Conference on Parallel Processing.
[29] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.
[30] Bronis R. de Supinski,et al. Heterogeneous Task Scheduling for Accelerated OpenMP , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.
[31] Joel H. Saltz,et al. Parallelizing Molecular Dynamics Programs for Distributed Memory Machines: An Application of the Cha , 1994 .
[32] Gagan Agrawal,et al. An execution strategy and optimized runtime support for parallelizing irregular reductions on modern GPUs , 2011, ICS '11.