vCUDA: GPU-Accelerated High-Performance Computing in Virtual Machines

This paper describes vCUDA, a general-purpose graphics processing unit (GPGPU) computing solution for virtual machines (VMs). vCUDA allows applications executing within VMs to leverage hardware acceleration, which can be beneficial to the performance of a class of high-performance computing (HPC) applications. The key insights in our design include API call interception and redirection and a dedicated RPC system for VMs. With API interception and redirection, Compute Unified Device Architecture (CUDA) applications in VMs can access a graphics hardware device and achieve high computing performance in a transparent way. In the current study, vCUDA achieved a near-native performance with the dedicated RPC system. We carried out a detailed analysis of the performance of our framework. Using a number of unmodified official examples from CUDA SDK and third-party applications in the evaluation, we observed that CUDA applications running with vCUDA exhibited a very low performance penalty in comparison with the native environment, thereby demonstrating the viability of vCUDA architecture.

[1]  Naga K. Govindaraju,et al.  A Survey of General‐Purpose Computation on Graphics Hardware , 2007 .

[2]  Xiaolan Zhang,et al.  XenSocket: A High-Throughput Interdomain Transport for Virtual Machines , 2007, Middleware.

[3]  David Tarditi,et al.  Accelerator: using data parallelism to program GPUs for general-purpose uses , 2006, ASPLOS XII.

[4]  Stefan Götz,et al.  Unmodified Device Driver Reuse and Improved System Dependability via Virtual Machines , 2004, OSDI.

[5]  Vanish Talwar,et al.  GViM: GPU-accelerated virtual machines , 2009, HPCVirt '09.

[6]  Scott Pakin,et al.  Design and Evaluation of an HPVM-Based Windows NT Supercomputer , 1999, Int. J. High Perform. Comput. Appl..

[7]  Eyal de Lara,et al.  VMM-independent graphics acceleration , 2007, VEE '07.

[8]  Andrew Warfield,et al.  Xen and the art of virtualization , 2003, SOSP '03.

[9]  Matei Ripeanu,et al.  StoreGPU: exploiting graphics processing units to accelerate distributed storage systems , 2008, HPDC '08.

[10]  Michael Gleicher,et al.  HijackGL: reconstructing from streams for stylized rendering , 2002, NPAR '02.

[11]  Brian N. Bershad,et al.  User-level interprocess communication for shared memory multiprocessors , 1991, TOCS.

[12]  Lin Shi,et al.  VMRPC: A high efficiency and light weight RPC system for virtual machines , 2010, 2010 IEEE 18th International Workshop on Quality of Service (IWQoS).

[13]  Dhabaleswar K. Panda,et al.  A case for high performance computing with virtual machines , 2006, ICS '06.

[14]  N. Fujimoto,et al.  Faster matrix-vector multiplication on GeForce 8800GTX , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[15]  Jeremy Sugerman,et al.  GPU virtualization on VMware's hosted I/O architecture , 2008, OPSR.

[16]  Greg Humphreys,et al.  Chromium: a stream-processing framework for interactive rendering on clusters , 2002, SIGGRAPH.

[17]  Jian Wang,et al.  XenLoop: a transparent high performance inter-vm network loopback , 2008, HPDC.

[18]  Willy Zwaenepoel,et al.  Diagnosing performance overheads in the xen virtual machine environment , 2005, VEE '05.

[19]  Jin-Soo Kim,et al.  Inter-domain socket communications supporting high performance and full binary compatibility on Xen , 2008, VEE '08.

[20]  Samuel T. King,et al.  ReVirt: enabling intrusion analysis through virtual-machine logging and replay , 2002, OPSR.

[21]  Federico Silla,et al.  rCUDA: Reducing the number of GPU-based accelerators in high performance clusters , 2010, 2010 International Conference on High Performance Computing & Simulation.

[22]  Alan L. Cox,et al.  Concurrent Direct Network Access for Virtual Machine Monitors , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[23]  Giulio Giunta,et al.  A GPGPU Transparent Virtualization Component for High Performance Computing Clouds , 2010, Euro-Par.

[24]  Lin Shi,et al.  vCUDA: GPU accelerated high performance computing in virtual machines , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[25]  A. Kivity,et al.  kvm : the Linux Virtual Machine Monitor , 2007 .

[26]  Gordon Stoll,et al.  WireGL: a scalable graphics system for clusters , 2001, SIGGRAPH.

[27]  Garth R. Goodson,et al.  Fido: Fast Inter-Virtual-Machine Communication for Enterprise Appliances , 2009, USENIX ATC.