Turning GPUs into Floating Devices over the Cluster: The Beauty of GPU Migration

Virtualization techniques have shown to report benefits to data centers and other computing facilities. In this regard, not only virtual machines allow reducing the size of the computing infrastructure while increasing overall resource utilization but also virtualizing individual components of computers may provide significant benefits. This is the case, for example, for the remote GPU virtualization technique, implemented in several frameworks during the recent years. The large degree of flexibility provided by the remote GPU virtualization technique can, however, be further increased by applying the migration mechanism to it, so that the GPU part of applications can be live migrated to another GPU elsewhere in the cluster during execution time in a transparent way. In this paper we present a discussion about how the migration mechanism has been applied to different GPU virtualization frameworks. We also provide a big picture about the possibilities that migrating the GPU part of applications can provide to data centers and other computing facilities. We finally present the first results of an ongoing work consisting on applying the migration mechanism to the rCUDA remote GPU virtualization framework.

[1]  Sergei Gorlatch,et al.  dOpenCL: Towards a Uniform Programming Approach for Distributed Heterogeneous Multi-/Many-Core Systems , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[2]  Jason Duell,et al.  Berkeley Lab Checkpoint/Restart (BLCR) for Linux Clusters , 2006 .

[3]  Cong Li,et al.  Kernel-based Virtual Machine , 2017 .

[4]  Jungwon Kim,et al.  SnuCL: an OpenCL framework for heterogeneous CPU/GPU clusters , 2012, ICS '12.

[5]  Kenli Li,et al.  vCUDA: GPU-Accelerated High-Performance Computing in Virtual Machines , 2012, IEEE Trans. Computers.

[6]  Vanish Talwar,et al.  GViM: GPU-accelerated virtual machines , 2009, HPCVirt '09.

[7]  Xiaolong Wu,et al.  Virtualization Technology and its Impact on Computer Hardware Architecture , 2011, 2011 Eighth International Conference on Information Technology: New Generations.

[8]  Nikolaos V. Sahinidis,et al.  GPU-BLAST: using graphics processors to accelerate protein sequence alignment , 2010, Bioinform..

[9]  Carlos Reaño,et al.  Local and Remote GPUs Perform Similar with EDR 100G InfiniBand , 2015, Middleware Industry.

[10]  Sergio Iserte,et al.  Increasing the Performance of Data Centers by Combining Remote GPU Virtualization with Slurm , 2016, 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid).

[11]  Wu-chun Feng,et al.  Transparent Accelerator Migration in a Virtualized GPU Environment , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[12]  Kai Pirttimäki Kernel-based Virtual Machinen (KVM) käyttö palvelinvirtualisoinnissa , 2016 .

[13]  邓泽国 浅谈Oracle VM VirtualBox虚拟机的网络配置 , 2011 .

[14]  Tetsu Narumi,et al.  DS-CUDA: A Middleware to Use Many GPUs in the Cloud Environment , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.

[15]  Jiajun Wang,et al.  gHA: An Efficient and Iterative Checkpointing Mechanism for Virtualized GPUs , 2016, APSys.

[16]  Vijay S. Pande,et al.  Hard Data on Soft Errors: A Large-Scale Assessment of Real-World Error Rates in GPGPU , 2009, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[17]  Wu-chun Feng,et al.  VOCL: An optimized environment for transparent virtualization of graphics processing units , 2012, 2012 Innovative Parallel Computing (InPar).

[18]  Giulio Giunta,et al.  A GPGPU Transparent Virtualization Component for High Performance Computing Clouds , 2010, Euro-Par.

[19]  Hiroaki Kobayashi,et al.  CheCUDA: A Checkpoint/Restart Tool for CUDA Applications , 2009, 2009 International Conference on Parallel and Distributed Computing, Applications and Technologies.

[20]  Yu-Wei Chang,et al.  GridCuda: A Grid-Enabled CUDA Programming Toolkit , 2011, 2011 IEEE Workshops of International Conference on Advanced Information Networking and Applications.

[21]  Carlos Reaño,et al.  A Performance Comparison of CUDA Remote GPU Virtualization Frameworks , 2015, 2015 IEEE International Conference on Cluster Computing.

[22]  Amnon Barak,et al.  A package for OpenCL based heterogeneous computing on clusters with many GPU devices , 2010, 2010 IEEE International Conference On Cluster Computing Workshops and Posters (CLUSTER WORKSHOPS).