Energy-Efficient Stencil Computations on Distributed GPUs Using Dynamic Parallelism and GPU-Controlled Communication
暂无分享,去创建一个
[1] Holger Fröning,et al. Energy-Efficient Collective Reduce and Allreduce Operations on Distributed GPUs , 2014, 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.
[2] D. Panda,et al. Extending OpenSHMEM for GPU Computing , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.
[3] Michela Taufer,et al. Performance impact of dynamic parallelism on different clustering algorithms , 2013, Defense, Security, and Sensing.
[4] John Shalf,et al. Exascale Computing Technology Challenges , 2010, VECPAR.
[5] Massimiliano Fatica,et al. Implementing the Himeno benchmark with CUDA on GPU clusters , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[6] Rahul Khanna,et al. RAPL: Memory power estimation and capping , 2010, 2010 ACM/IEEE International Symposium on Low-Power Electronics and Design (ISLPED).
[7] John D. Owens,et al. Message passing on data-parallel architectures , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[8] Sudhakar Yalamanchili,et al. Coordinated energy management in heterogeneous processors , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[9] John E. Stone,et al. Quantifying the impact of GPUs on performance and energy efficiency in HPC clusters , 2010, International Conference on Green Computing.
[10] Jeff A. Stuart,et al. A study of Persistent Threads style GPU programming for GPGPU workloads , 2012, 2012 Innovative Parallel Computing (InPar).
[11] Fei Wang,et al. Accelerating BIRCH for Clustering Large Scale Streaming Data Using CUDA Dynamic Parallelism , 2013, IDEAL.
[12] Yi Yang,et al. CUDA-NP: Realizing Nested Thread-Level Parallelism in GPGPU Applications , 2015, Journal of Computer Science and Technology.
[13] Dhabaleswar K. Panda,et al. Efficient Inter-node MPI Communication Using GPUDirect RDMA for InfiniBand Clusters with NVIDIA GPUs , 2013, 2013 42nd International Conference on Parallel Processing.
[14] Fei Wang,et al. Graph-Based Substructure Pattern Mining Using CUDA Dynamic Parallelism , 2013, IDEAL.
[15] Wu-chun Feng,et al. Inter-block GPU communication via fast barrier synchronization , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[16] Lena Oden,et al. GPI2 for GPUs: A PGAS framework for efficient communication in hybrid clusters , 2013, PARCO.
[17] Holger Fröning,et al. GGAS: Global GPU address spaces for efficient communication in heterogeneous clusters , 2013, 2013 IEEE International Conference on Cluster Computing (CLUSTER).
[18] C. Simmendinger,et al. The GASPI API specification and its implementation GPI 2.0 , 2013 .
[19] Holger Fröning,et al. Analyzing Put/Get APIs for Thread-Collaborative Processors , 2014, 2014 43rd International Conference on Parallel Processing Workshops.
[20] Holger Fröning,et al. InfiniBand Verbs on GPU: a case study of controlling an InfiniBand network device from the GPU , 2014, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops.
[21] Sayantan Sur,et al. MVAPICH2-GPU: optimized GPU to GPU communication for InfiniBand clusters , 2011, Computer Science - Research and Development.