qCUDA: GPGPU Virtualization for High Bandwidth Efficiency

The increasing demand for machine learning computation contributes to the convergence of high-performance computing and cloud computing, in which the virtualization of Graphics Processing Units (GPUs) becomes a critical issue. Although many GPGPU virtualization frameworks have been proposed, their performance is limited by the bandwidth of data transactions between the virtual machine (VM) and host. In this paper, we present a virtualization framework, qCUDA, to improve the performance of compute unified device architecture (CUDA) programs. qCUDA is based on the virtio framework, providing the para-virtualized driver and the device module for performing the interaction with the API remoting and memory management methods. In our test environment, qCUDA can achieve above 95% of the bandwidth efficiency for most results by comparing it with the native. Also, qCUDA has the features of flexibility and interposition. It can execute CUDA-compatible programs in the Linux and Windows VMs, respectively, on QEMU-KVM hypervisor for GPGPU virtualization.

[1]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[2]  Jungwon Kim,et al.  SnuCL: an OpenCL framework for heterogeneous CPU/GPU clusters , 2012, ICS '12.

[3]  Carlos Reaño,et al.  Local and Remote GPUs Perform Similar with EDR 100G InfiniBand , 2015, Middleware Industry.

[4]  Jeremy Sugerman,et al.  GPU virtualization on VMware's hosted I/O architecture , 2008, OPSR.

[5]  Tetsu Narumi,et al.  DS-CUDA: A Middleware to Use Many GPUs in the Cloud Environment , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.

[6]  Chien-Hung Chen,et al.  Smart in-car camera system using mobile cloud computing framework for deep learning , 2017, Veh. Commun..

[7]  Yu-Wei Chang,et al.  GridCuda: A Grid-Enabled CUDA Programming Toolkit , 2011, 2011 IEEE Workshops of International Conference on Advanced Information Networking and Applications.

[8]  Rusty Russell,et al.  virtio: towards a de-facto standard for virtual I/O devices , 2008, OPSR.

[9]  Vanish Talwar,et al.  GViM: GPU-accelerated virtual machines , 2009, HPCVirt '09.

[10]  Giulio Giunta,et al.  A GPGPU Transparent Virtualization Component for High Performance Computing Clouds , 2010, Euro-Par.

[11]  Che-Rung Lee,et al.  G-KVM: A Full GPU Virtualization on KVM , 2016, 2016 IEEE International Conference on Computer and Information Technology (CIT).

[12]  Shinpei Kato,et al.  Gdev: First-Class GPU Resource Management in the Operating System , 2012, USENIX Annual Technical Conference.

[13]  Wu-chun Feng,et al.  VOCL: An optimized environment for transparent virtualization of graphics processing units , 2012, 2012 Innovative Parallel Computing (InPar).

[14]  Andrew Warfield,et al.  Safe Hardware Access with the Xen Virtual Machine Monitor , 2007 .

[15]  Federico Silla,et al.  Enabling CUDA acceleration within virtual machines using rCUDA , 2011, 2011 18th International Conference on High Performance Computing.

[16]  Chandra Krintz,et al.  Paravirtualization for HPC Systems , 2006, ISPA Workshops.

[17]  Carlos Reaño,et al.  Influence of InfiniBand FDR on the performance of remote GPU virtualization , 2013, 2013 IEEE International Conference on Cluster Computing (CLUSTER).

[18]  Kevin Skadron,et al.  Dymaxion: Optimizing memory access patterns for heterogeneous systems , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[19]  Carlos Reaño,et al.  A Performance Comparison of CUDA Remote GPU Virtualization Frameworks , 2015, 2015 IEEE International Conference on Cluster Computing.

[20]  Sergei Gorlatch,et al.  dOpenCL: Towards a Uniform Programming Approach for Distributed Heterogeneous Multi-/Many-Core Systems , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[21]  Carlos Reaño,et al.  A complete and efficient CUDA-sharing solution for HPC clusters , 2014, Parallel Comput..

[22]  Shlomo Weiss,et al.  Virtio network paravirtualization driver: Implementation and performance of a de-facto standard , 2012, Comput. Stand. Interfaces.

[23]  Carlos Reaño,et al.  CU2rCU: Towards the complete rCUDA remote GPU virtualization and sharing solution , 2012, 2012 19th International Conference on High Performance Computing.

[24]  Kenli Li,et al.  vCUDA: GPU-Accelerated High-Performance Computing in Virtual Machines , 2012, IEEE Trans. Computers.

[25]  Carlos Reaño,et al.  Providing CUDA Acceleration to KVM Virtual Machines in InfiniBand Clusters with rCUDA , 2016, DAIS.

[26]  Yi-Ping You,et al.  Enabling OpenCL support for GPGPU in Kernel‐based Virtual Machine , 2014, Softw. Pract. Exp..

[27]  Shinpei Kato,et al.  GPUvm: GPU Virtualization at the Hypervisor , 2016, IEEE Transactions on Computers.

[28]  Federico Silla,et al.  rCUDA: Reducing the number of GPU-based accelerators in high performance clusters , 2010, 2010 International Conference on High Performance Computing & Simulation.