Using High Level GPU Tasks to Explore Memory and Communications Options on Heterogeneous Platforms
暂无分享,去创建一个
[1] David I. August,et al. Automatic CPU-GPU communication management and optimization , 2011, PLDI '11.
[2] Nathan Bell,et al. Thrust: A Productivity-Oriented Library for CUDA , 2012 .
[3] Raphael Landaverde,et al. An investigation of Unified Memory Access performance in CUDA , 2014, 2014 IEEE High Performance Extreme Computing Conference (HPEC).
[4] Murat Efe Guney,et al. On the limits of GPU acceleration , 2010 .
[5] Simon See,et al. An Evaluation of Unified Memory Technology on NVIDIA GPUs , 2015, 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.
[6] Satoshi Matsuoka,et al. CUDA vs OpenACC: Performance Case Studies with Kernel Benchmarks and a Memory-Bound CFD Application , 2013, 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing.
[7] Ronan Keryell,et al. Khronos SYCL for OpenCL: a tutorial , 2015, IWOCL.
[8] Chao Liu,et al. A Framework for Developing Parallel Applications with high level Tasks on Heterogeneous Platforms , 2017, PMAM@PPoPP.
[9] William J. Dally,et al. The GPU Computing Era , 2010, IEEE Micro.
[10] Kevin Skadron,et al. Dymaxion: Optimizing memory access patterns for heterogeneous systems , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[11] Miriam Leeser,et al. Design space exploration of GPU Accelerated cluster systems for optimal data transfer using PCIe bus , 2016, 2016 IEEE High Performance Extreme Computing Conference (HPEC).
[12] Tianyi David Han,et al. Reducing branch divergence in GPU programs , 2011, GPGPU-4.
[13] Shinpei Kato,et al. GDM: device memory management for gpgpu computing , 2014, SIGMETRICS '14.
[14] Christian Terboven,et al. OpenACC - First Experiences with Real-World Applications , 2012, Euro-Par.
[15] Vijay Saraswat,et al. GPU programming in a high level language: compiling X10 to CUDA , 2011, X10 '11.
[16] Frank Bellosa,et al. GPUswap: Enabling Oversubscription of GPU Memory through Transparent Swapping , 2015, VEE.
[17] Christoph W. Kessler,et al. VectorPU: A Generic and Efficient Data-container and Component Model for Transparent Data Transfer on GPU-based Heterogeneous Systems , 2017, PARMA-DITAM '17.
[18] Mohamed Wahib,et al. Scalable Kernel Fusion for Memory-Bound GPU Applications , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.
[19] John E. Stone,et al. OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems , 2010, Computing in Science & Engineering.
[20] Kai Lu,et al. Adaptive Optimization for Petascale Heterogeneous CPU/GPU Computing , 2010, 2010 IEEE International Conference on Cluster Computing.