On the support of inter-node P2P GPU memory copies in rCUDA

Abstract Although GPUs are being widely adopted in order to noticeably reduce the execution time of many applications, their use presents several side effects such as an increased acquisition cost of the cluster nodes or an increased overall energy consumption. To address these concerns, GPU virtualization frameworks could be used. These frameworks allow accelerated applications to transparently use GPUs located in cluster nodes other than the one executing the program. Furthermore, these frameworks aim to offer the same API as the NVIDIA CUDA Runtime API does, although different frameworks provide different degree of support. In general, and because of the complexity of implementing an efficient mechanism, none of the existing frameworks provides support for memory copies between remote GPUs located in different nodes. In this paper we introduce an efficient mechanism devised for addressing the support for this kind of memory copies among GPUs located in different cluster nodes. Several options are explored and analyzed, such as the use of the GPUDirect RDMA mechanism. We focus our discussion on the rCUDA remote GPU virtualization framework. Results show that is possible to implement this kind of memory copies in such an efficient way that performance is even improved with respect to the original performance attained by CUDA when GPUs located in the same cluster node are leveraged.

[1]  Ramani Duraiswami,et al.  Canny edge detection on NVIDIA CUDA , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[2]  Jack J. Dongarra,et al.  Tridiagonalization of a dense symmetric matrix on multiple GPUs and its application to symmetric eigenvalue problems , 2014, Concurr. Comput. Pract. Exp..

[3]  Dhabaleswar K. Panda,et al.  Exploiting GPUDirect RDMA in Designing High Performance OpenSHMEM for NVIDIA GPU Clusters , 2015, 2015 IEEE International Conference on Cluster Computing.

[4]  Carlos Reaño,et al.  A Performance Comparison of CUDA Remote GPU Virtualization Frameworks , 2015, 2015 IEEE International Conference on Cluster Computing.

[5]  Shinpei Kato,et al.  GPUvm: Why Not Virtualizing GPUs at the Hypervisor? , 2014, USENIX Annual Technical Conference.

[6]  Sadaf R. Alam,et al.  Performance modeling of microsecond scale biological molecular dynamics simulations on heterogeneous architectures , 2013, Concurr. Comput. Pract. Exp..

[7]  Tetsu Narumi,et al.  DS-CUDA: A Middleware to Use Many GPUs in the Cloud Environment , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.

[8]  Giulio Giunta,et al.  A GPGPU Transparent Virtualization Component for High Performance Computing Clouds , 2010, Euro-Par.

[9]  Vladimir Surkov Parallel option pricing with Fourier Space Time-stepping method on Graphics Processing Units , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[10]  Geoffrey Fox,et al.  Supporting High Performance Molecular Dynamics in Virtualized Clusters using IOMMU, SR-IOV, and GPUDirect , 2015, VEE.

[11]  Vanish Talwar,et al.  GViM: GPU-accelerated virtual machines , 2009, HPCVirt '09.

[12]  Carlos Reaño,et al.  Local and Remote GPUs Perform Similar with EDR 100G InfiniBand , 2015, Middleware Industry.

[13]  Sergio Iserte,et al.  Increasing the Performance of Data Centers by Combining Remote GPU Virtualization with Slurm , 2016, 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid).

[14]  Dhabaleswar K. Panda,et al.  Efficient Inter-node MPI Communication Using GPUDirect RDMA for InfiniBand Clusters with NVIDIA GPUs , 2013, 2013 42nd International Conference on Parallel Processing.

[15]  Sudhakar Yalamanchili,et al.  Red Fox: An Execution Environment for Relational Query Processing on GPUs , 2014, CGO '14.

[16]  Adam Polak,et al.  Counting Triangles in Large Graphs on GPU , 2015, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[17]  Carlos Reaño,et al.  Improving the user experience of the rCUDA remote GPU virtualization framework , 2015, Concurr. Comput. Pract. Exp..

[18]  Kenneth A. Hawick,et al.  Data Parallel Three-Dimensional Cahn-Hilliard Field Equation Simulation on GPUs with CUDA , 2009, PDPTA.