论文信息 - Offloading Communication Control Logic in GPU Accelerated Applications

Offloading Communication Control Logic in GPU Accelerated Applications

NVIDIA GPUDirect is a family of technologiesaimed at optimizing data movement among GPUs (P2P) orbetween GPUs and third-party devices (RDMA). GPUDirectAsync, introduced in CUDA 8.0, is a new addition whichallows direct synchronization between GPU and third partydevices. For example, Async allows an NVIDIA GPU to directlytrigger and poll for completion of communication operationsqueued to an InfiniBand Connect-IB network adapter, removingCPU involvement from the critical path in GPU acceleratedapplications. In this paper, we present the building blocks ofGPUDirect Async and explain the supported usage models ofthis new technology. We also present a performance evaluationusing a micro-benchmark and a synthetic stencil benchmark. Finally, we demonstrate the use of Async in a few multi-GPUMPI applications: HPGMG-FV (geometric multi-grid), achievingup to 25% improvement in total execution time, CoMD-CUDA(classical molecular dynamics), reducing communications timesup to 30%, LULESH2-CUDA, achieving an average performanceimprovement of 13% in the total execution time.

[1] Thierry Gallouët,et al. Finite volume method , 2010, Scholarpedia.

[2] Mark Silberstein,et al. GPUrdma: GPU-side library for high performance networking from GPU kernels , 2016, ROSS@HPDC.

[3] Holger Fröning,et al. Infiniband-Verbs on GPU: A Case Study of Controlling an Infiniband Network Device from the GPU , 2014, IPDPS Workshops.

[4] Ian Karlin,et al. LULESH 2.0 Updates and Changes , 2013 .

[5] Dhabaleswar K. Panda,et al. Efficient Inter-node MPI Communication Using GPUDirect RDMA for InfiniBand Clusters with NVIDIA GPUs , 2013, 2013 42nd International Conference on Parallel Processing.

[6] Mark Silberstein,et al. GPUnet , 2014, OSDI.