gMig: Efficient GPU Live Migration Optimized by Software Dirty Page for Full Virtualization

This paper introduces gMig, an open-source and practical GPU live migration solution for full virtualization. By taking advantage of the dirty pattern of GPU workloads, gMig presents the One-Shot Pre-Copy combined with the hashing based Software Dirty Page technique to achieve efficient GPU live migration. Particularly, we propose three approaches for gMig: 1) Dynamic Graphics Address Remapping, which parses and manipulates GPU commands to adjust the address mapping to adapt to a different environment after migration, 2) Software Dirty Page, which utilizes a hashing based approach to detect page modification, overcomes the commodity GPU's hardware limitation, and speeds up the migration by only sending the dirtied pages, 3) One-Shot Pre-Copy, which greatly reduces the rounds of pre-copy of graphics memory. Our evaluation shows that gMig achieves GPU live migration with an average downtime of 302 ms on Windows and 119 ms on Linux. With the help of Software Dirty Page, the number of GPU pages transferred during the downtime is effectively reduced by 80.0%.

[1]  Hai Jin,et al.  Live migration of virtual machine based on full system trace and replay , 2009, HPDC '09.

[2]  Dutch T. Meyer,et al.  Remus: High Availability via Asynchronous Virtual Machine Replication. (Best Paper) , 2008, NSDI.

[3]  Hiroshi Yamada,et al.  Towards unobtrusive VM live migration for cloud computing platforms , 2012, APSys.

[4]  Hai Jin,et al.  Live virtual machine migration with adaptive, memory compression , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[5]  Geoffrey C. Fox,et al.  GPU Passthrough Performance: A Comparison of KVM, Xen, VMWare ESXi, and LXC for CUDA and OpenCL Applications , 2014, 2014 IEEE 7th International Conference on Cloud Computing.

[6]  Carlos Reaño,et al.  Reducing the performance gap of remote GPU virtualization with InfiniBand Connect-IB , 2016, 2016 IEEE Symposium on Computers and Communication (ISCC).

[7]  Jiajun Wang,et al.  gHA: An Efficient and Iterative Checkpointing Mechanism for Virtualized GPUs , 2016, APSys.

[8]  Lin Zhong,et al.  Eliminating State Entanglement with Checkpoint-based Virtualization of Mobile OS Services , 2015, APSys.

[9]  Liang Liu,et al.  GreenCloud: a new architecture for green data center , 2009, ICAC-INDST '09.

[10]  R HinesMichael,et al.  Post-copy live migration of virtual machines , 2009 .

[11]  Dong Xu,et al.  A Time-Series Based Precopy Approach for Live Migration of Virtual Machines , 2011, 2011 IEEE 17th International Conference on Parallel and Distributed Systems.

[12]  Yaozu Dong,et al.  A Full GPU Virtualization Solution with Mediated Pass-Through , 2014, USENIX Annual Technical Conference.

[13]  Shinpei Kato,et al.  GPUvm: GPU Virtualization at the Hypervisor , 2016, IEEE Transactions on Computers.

[14]  Lin Shi,et al.  vCUDA: GPU accelerated high performance computing in virtual machines , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[15]  Carl A. Waldspurger,et al.  Memory resource management in VMware ESX server , 2002, OSDI '02.

[16]  Christopher Smowton Secure 3D graphics for virtual machines , 2009, EUROSEC '09.

[17]  Petter Svärd,et al.  Evaluation of delta compression techniques for efficient live migration of large virtual machines , 2011, VEE '11.

[18]  Rajkumar Buyya,et al.  Energy Efficient Resource Management in Virtualized Cloud Data Centers , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[19]  Shinpei Kato,et al.  GPUvm: Why Not Virtualizing GPUs at the Hypervisor? , 2014, USENIX Annual Technical Conference.

[20]  Vanish Talwar,et al.  GViM: GPU-accelerated virtual machines , 2009, HPCVirt '09.

[21]  Cheol-Ho Hong,et al.  GPU Virtualization and Scheduling Methods , 2017, ACM Comput. Surv..

[22]  Andrew Warfield,et al.  Live migration of virtual machines , 2005, NSDI.

[23]  Bingsheng He,et al.  Scalable GPU Virtualization with Dynamic Sharing of Graphics Memory Space , 2018, IEEE Transactions on Parallel and Distributed Systems.

[24]  Jiajun Wang,et al.  Boosting GPU Virtualization Performance with Hybrid Shadow Page Tables , 2015, USENIX Annual Technical Conference.

[25]  Bingsheng He,et al.  gScale: Scaling up GPU Virtualization with Dynamic Sharing of Graphics Memory Space , 2016, USENIX Annual Technical Conference.

[26]  Jacob Gorm Hansen,et al.  Blink: Advanced Display Multiplexing for Virtualized Applications , 2007 .

[27]  DongYaozu,et al.  High performance network virtualization with SR-IOV , 2012 .

[28]  S. Sahni,et al.  A Hybrid Approach to Live Migration of Virtual Machines , 2012, 2012 IEEE International Conference on Cloud Computing in Emerging Markets (CCEM).

[29]  Carlos Reaño,et al.  Providing CUDA Acceleration to KVM Virtual Machines in InfiniBand Clusters with rCUDA , 2016, DAIS.