On the benefits of the remote GPU virtualization mechanism: The rCUDA case

Graphics processing units (GPUs) are being adopted in many computing facilities given their extraordinary computing power, which makes it possible to accelerate many general purpose applications from different domains. However, GPUs also present several side effects, such as increased acquisition costs as well as larger space requirements. They also require more powerful energy supplies. Furthermore, GPUs still consume some amount of energy while idle, and their utilization is usually low for most workloads. In a similar way to virtual machines, the use of virtual GPUs may address the aforementioned concerns. In this regard, the remote GPU virtualization mechanism allows an application being executed in a node of the cluster to transparently use the GPUs installed at other nodes. Moreover, this technique allows to share the GPUs present in the computing facility among the applications being executed in the cluster. In this way, several applications being executed in different (or the same) cluster nodes can share 1 or more GPUs located in other nodes of the cluster. Sharing GPUs should increase overall GPU utilization, thus reducing the negative impact of the side effects mentioned before. Reducing the total amount of GPUs installed in the cluster may also be possible. In this paper, we explore some of the benefits that remote GPU virtualization brings to clusters. For instance, this mechanism allows an application to use all the GPUs present in the computing facility. Another benefit of this technique is that cluster throughput, measured as jobs completed per time unit, is noticeably increased when this technique is used. In this regard, cluster throughput can be doubled for some workloads. Furthermore, in addition to increase overall GPU utilization, total energy consumption can be reduced up to 40%. This may be key in the context of exascale computing facilities, which present an important energy constraint. Other benefits are related to the cloud computing domain, where a GPU can be easily shared among several virtual machines. Finally, GPU migration (and therefore server consolidation) is one more benefit of this novel technique.

[1]  Yixin Chen,et al.  Drug activity prediction using multiple-instance learning via joint instance and feature selection , 2013, BMC Bioinformatics.

[2]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[3]  Giulio Giunta,et al.  A GPGPU Transparent Virtualization Component for High Performance Computing Clouds , 2010, Euro-Par.

[4]  Geoffrey C. Fox,et al.  GPU Passthrough Performance: A Comparison of KVM, Xen, VMWare ESXi, and LXC for CUDA and OpenCL Applications , 2014, 2014 IEEE 7th International Conference on Cloud Computing.

[5]  Hiroaki Kobayashi,et al.  CheCUDA: A Checkpoint/Restart Tool for CUDA Applications , 2009, 2009 International Conference on Parallel and Distributed Computing, Applications and Technologies.

[6]  Vishakha Gupta,et al.  Shadowfax: scaling in heterogeneous cluster systems via GPGPU assemblies , 2011, VTDC '11.

[7]  M. Jette,et al.  Simple Linux Utility for Resource Management , 2009 .

[8]  Ching-Hsien Hsu,et al.  On implementation of GPU virtualization using PCI pass-through , 2012, 4th IEEE International Conference on Cloud Computing Technology and Science Proceedings.

[9]  Steven J. Plimpton,et al.  Implementing molecular dynamics on hybrid high performance computers - Particle-particle particle-mesh , 2012, Comput. Phys. Commun..

[10]  Andy B. Yoo,et al.  Approved for Public Release; Further Dissemination Unlimited X-ray Pulse Compression Using Strained Crystals X-ray Pulse Compression Using Strained Crystals , 2002 .

[11]  Javier Prades,et al.  CUDA acceleration for Xen virtual machines in infiniband clusters with rCUDA , 2016, PPoPP.

[12]  Carlos Reaño,et al.  Local and Remote GPUs Perform Similar with EDR 100G InfiniBand , 2015, Middleware Industry.

[13]  Kenli Li,et al.  vCUDA: GPU-Accelerated High-Performance Computing in Virtual Machines , 2012, IEEE Trans. Computers.

[14]  S. Salzberg,et al.  Versatile and open software for comparing large genomes , 2004, Genome Biology.

[15]  Ramani Duraiswami,et al.  Canny edge detection on NVIDIA CUDA , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[16]  Vladimir Surkov Parallel option pricing with Fourier Space Time-stepping method on Graphics Processing Units , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[17]  Vanish Talwar,et al.  GViM: GPU-accelerated virtual machines , 2009, HPCVirt '09.

[18]  Yongchao Liu,et al.  CUDA-MEME: Accelerating motif discovery in biological sequences using CUDA-enabled graphics processing units , 2010, Pattern Recognit. Lett..

[19]  Carlos Reaño,et al.  A complete and efficient CUDA-sharing solution for HPC clusters , 2014, Parallel Comput..

[20]  Carlos Reaño,et al.  Providing CUDA Acceleration to KVM Virtual Machines in InfiniBand Clusters with rCUDA , 2016, DAIS.

[21]  Sadaf R. Alam,et al.  Performance modeling of microsecond scale biological molecular dynamics simulations on heterogeneous architectures , 2013, Concurr. Comput. Pract. Exp..

[22]  Graham Pullan,et al.  BarraCUDA - a fast short read sequence aligner using graphics processing units , 2011, BMC Research Notes.

[23]  Yu-Wei Chang,et al.  GridCuda: A Grid-Enabled CUDA Programming Toolkit , 2011, 2011 IEEE Workshops of International Conference on Advanced Information Networking and Applications.

[24]  Jinkyu Jeong,et al.  Exploiting GPUs in Virtual Machine for BioCloud , 2013, BioMed research international.

[25]  Sudhakar Yalamanchili,et al.  Red Fox: An Execution Environment for Relational Query Processing on GPUs , 2014, CGO '14.

[26]  Peter M. Kasson,et al.  GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit , 2013, Bioinform..

[27]  Nikolaos V. Sahinidis,et al.  GPU-BLAST: using graphics processors to accelerate protein sequence alignment , 2010, Bioinform..

[28]  Laxmikant V. Kalé,et al.  Scalable molecular dynamics with NAMD , 2005, J. Comput. Chem..

[29]  Tetsu Narumi,et al.  DS-CUDA: A Middleware to Use Many GPUs in the Cloud Environment , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.

[30]  Lin Shi,et al.  vCUDA: GPU accelerated high performance computing in virtual machines , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[31]  Sergio Iserte,et al.  Remote GPU Virtualization: Is It Useful? , 2016, 2016 2nd IEEE International Workshop on High-Performance Interconnection Networks in the Exascale and Big-Data Era (HiPINEB).

[32]  Yongchao Liu,et al.  CUDASW++ 3.0: accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions , 2013, BMC Bioinformatics.

[33]  Carlos Reaño,et al.  Improving the user experience of the rCUDA remote GPU virtualization framework , 2015, Concurr. Comput. Pract. Exp..

[34]  Sergio Iserte,et al.  SLURM Support for Remote GPU Virtualization: Implementation and Performance Study , 2014, 2014 IEEE 26th International Symposium on Computer Architecture and High Performance Computing.

[35]  Kenneth A. Hawick,et al.  Data Parallel Three-Dimensional Cahn-Hilliard Field Equation Simulation on GPUs with CUDA , 2009, PDPTA.

[36]  Jack J. Dongarra,et al.  Tridiagonalization of a dense symmetric matrix on multiple GPUs and its application to symmetric eigenvalue problems , 2014, Concurr. Comput. Pract. Exp..