Optimizing MPI Communication on Multi-GPU Systems Using CUDA Inter-Process Communication
暂无分享,去创建一个
Dhabaleswar K. Panda | Hao Wang | Sreeram Potluri | Ashish Kumar Singh | Devendar Bureddy | Carlos Rosales | D. Panda | Hao Wang | S. Potluri | Devendar Bureddy | C. Rosales | A. Singh
[1] Sayantan Sur,et al. Lightweight kernel-level primitives for high-performance MPI intra-node communication over multi-core systems , 2007, 2007 IEEE International Conference on Cluster Computing.
[2] Feng Qiu,et al. Zippy: A Framework for Computation and Visualization on a GPU Cluster , 2008, Comput. Graph. Forum.
[3] Guillaume Mercier,et al. Cache-Efficient, Intranode, Large-Message MPI Communication with MPICH2-Nemesis , 2009, 2009 International Conference on Parallel Processing.
[4] P. Glaskowsky. NVIDIA ’ s Fermi : The First Complete GPU Computing Architecture , 2009 .
[5] John D. Owens,et al. Message passing on data-parallel architectures , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[6] Orion S. Lawlor,et al. Message passing for GPGPU clusters: CudaMPI , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.
[7] Sayantan Sur,et al. Designing truly one-sided MPI-2 RMA intra-node communication on multi-core systems , 2010, Computer Science - Research and Development.
[8] Sayantan Sur,et al. Quantifying performance benefits of overlap using MPI-2 in a seismic modeling application , 2010, ICS '10.
[9] John E. Stone,et al. An asymmetric distributed shared memory model for heterogeneous parallel systems , 2010, ASPLOS XV.
[10] Jeffrey S. Vetter,et al. Quantifying NUMA and contention effects in multi-GPU systems , 2011, GPGPU-4.
[11] Sayantan Sur,et al. MVAPICH2-GPU: optimized GPU to GPU communication for InfiniBand clusters , 2011, Computer Science - Research and Development.
[12] Carlos Rosales,et al. Multiphase LBM Distributed over Multiple GPUs , 2011, 2011 IEEE International Conference on Cluster Computing.
[13] Sayantan Sur,et al. Optimizing MPI One Sided Communication on Multi-core InfiniBand Clusters Using Shared Memory Backed Windows , 2011, EuroMPI.
[14] Sayantan Sur,et al. Optimized Non-contiguous MPI Datatype Communication for GPU Clusters: Design, Implementation and Evaluation with MVAPICH2 , 2011, 2011 IEEE International Conference on Cluster Computing.
[15] John D. Owens,et al. Extending MPI to accelerators , 2011, ASBD '11.