DymGPU: Dynamic Memory Management for Sharing GPUs in Virtualized Clouds

gVirt is a full GPU virtualization technique for Intel's integrated GPUs that alleviates the problems of other GPU virtualization techniques such as API remoting and direct pass-through. The original gVirt is known to have an inherent scalability limitation on the number of simultaneous virtual machines (VM). gScale solved this problem by allowing each VM to share a global graphics memory space and copy the entries in a private graphics translation table (GTT) to a physical GTT along with a GPU context switch. However, it still suffers from a large overhead of copying entries between private GTT and physical GTT, which becomes worse when the global graphics memory space allocated for each VM is overlapped. In this paper, we identify that the copy overhead caused by GPU context switch is the major bottleneck in performance improvement and propose a dynamic memory management scheme, called DymGPU, that provides two memory allocation algorithms such as size-based and utilization-based algorithms. While the size-based algorithm allocates memory space based on the memory size required by each VM, the utilization-based algorithm considers GPU utilization of each VM to allocate the memory space. DymGPU is also dynamic in the sense that the global graphics memory space used by each VM is rearranged at runtime by periodically checking idle VMs and GPU utilization of each runnable VM. We have implemented our proposed approach in gVirt and confirmed that the proposed scheme reduces GPU context switch time by up to 53% and improved the overall performance of various GPU applications by up to 39%.