Remote GPU Virtualization: Is It Useful?

Graphics Processing Units (GPUs) are currently used in many computing facilities. However, GPUs present several side effects, such as increased acquisition costs as well as larger space requirements. Also, GPUs still require some amount of energy while idle and their utilization is usually low. In a similar way to virtual machines, using virtual GPUs may address the mentioned concerns. In this regard, remote GPU virtualization allows to share the GPUs present in the computing facility among the nodes of the cluster. This would increase overall GPU utilization, thus reducing the negative impact of the increased costs mentioned before. Reducing the amount of GPUs installed in the cluster could also be possible. In this paper we explore some of the benefits that remote GPU virtualization brings to clusters. For instance, this mechanism allows an application to use all the GPUs present in a cluster. Another benefit of this technique is that cluster throughput, measured as jobs completed per time unit, is doubled when this technique is used. Furthermore, in addition to increasing overall GPU utilization, total energy consumption is reduced up to 40%. This may be key in the context of exascale computing facilities, which present an important energy constraint.

[1]  Vladimir Surkov Parallel option pricing with Fourier Space Time-stepping method on Graphics Processing Units , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[2]  Sergio Iserte,et al.  SLURM Support for Remote GPU Virtualization: Implementation and Performance Study , 2014, 2014 IEEE 26th International Symposium on Computer Architecture and High Performance Computing.

[3]  Vanish Talwar,et al.  GViM: GPU-accelerated virtual machines , 2009, HPCVirt '09.

[4]  Kenneth A. Hawick,et al.  Data Parallel Three-Dimensional Cahn-Hilliard Field Equation Simulation on GPUs with CUDA , 2009, PDPTA.

[5]  Carlos Reaño,et al.  A Performance Comparison of CUDA Remote GPU Virtualization Frameworks , 2015, 2015 IEEE International Conference on Cluster Computing.

[6]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[7]  Laxmikant V. Kalé,et al.  Scalable molecular dynamics with NAMD , 2005, J. Comput. Chem..

[8]  Andy B. Yoo,et al.  Approved for Public Release; Further Dissemination Unlimited X-ray Pulse Compression Using Strained Crystals X-ray Pulse Compression Using Strained Crystals , 2002 .

[9]  Carlos Reaño,et al.  A complete and efficient CUDA-sharing solution for HPC clusters , 2014, Parallel Comput..

[10]  Peter M. Kasson,et al.  GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit , 2013, Bioinform..

[11]  Yongchao Liu,et al.  CUDA-MEME: Accelerating motif discovery in biological sequences using CUDA-enabled graphics processing units , 2010, Pattern Recognit. Lett..

[12]  Yu-Wei Chang,et al.  GridCuda: A Grid-Enabled CUDA Programming Toolkit , 2011, 2011 IEEE Workshops of International Conference on Advanced Information Networking and Applications.

[13]  Jinkyu Jeong,et al.  Exploiting GPUs in Virtual Machine for BioCloud , 2013, BioMed research international.

[14]  Sudhakar Yalamanchili,et al.  Red Fox: An Execution Environment for Relational Query Processing on GPUs , 2014, CGO '14.

[15]  Tetsu Narumi,et al.  DS-CUDA: A Middleware to Use Many GPUs in the Cloud Environment , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.

[16]  Lin Shi,et al.  vCUDA: GPU accelerated high performance computing in virtual machines , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[17]  S. Salzberg,et al.  Versatile and open software for comparing large genomes , 2004, Genome Biology.

[18]  Yongchao Liu,et al.  CUDASW++ 3.0: accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions , 2013, BMC Bioinformatics.

[19]  M. Jette,et al.  Simple Linux Utility for Resource Management , 2009 .

[20]  Ching-Hsien Hsu,et al.  On implementation of GPU virtualization using PCI pass-through , 2012, 4th IEEE International Conference on Cloud Computing Technology and Science Proceedings.

[21]  J. van Leeuwen,et al.  Job Scheduling Strategies for Parallel Processing , 2003, Lecture Notes in Computer Science.

[22]  Carlos Reaño,et al.  Local and Remote GPUs Perform Similar with EDR 100G InfiniBand , 2015, Middleware Industry.

[23]  Jack J. Dongarra,et al.  Tridiagonalization of a dense symmetric matrix on multiple GPUs and its application to symmetric eigenvalue problems , 2014, Concurr. Comput. Pract. Exp..

[24]  Steven J. Plimpton,et al.  Implementing molecular dynamics on hybrid high performance computers - Particle-particle particle-mesh , 2012, Comput. Phys. Commun..

[25]  Sadaf R. Alam,et al.  Performance modeling of microsecond scale biological molecular dynamics simulations on heterogeneous architectures , 2013, Concurr. Comput. Pract. Exp..

[26]  Graham Pullan,et al.  BarraCUDA - a fast short read sequence aligner using graphics processing units , 2011, BMC Research Notes.

[27]  Moni Naor,et al.  Job Scheduling Strategies for Parallel Processing , 2017, Lecture Notes in Computer Science.

[28]  Ramani Duraiswami,et al.  Canny edge detection on NVIDIA CUDA , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.