gQoS: A QoS-Oriented GPU Virtualization with Adaptive Capacity Sharing

Currently, the virtualization technologies for cloud computing infrastructures supporting extra devices, such as GPU, require additional development and refinement. This requirement is particularly evident in the area of resource sharing and allocation under some performance constraints, like the quality of service (QoS) guarantee, in light of the closed GPU platform. This deficiency significantly limits the applicability range of the cloud platform, which aims to support the efficient and fluent execution of business and academic workloads. This paper introduces gQoS, an adaptive virtualized GPU resource capacity sharing system under the QoS target, which can share and allocate the virtualized GPU resource among workloads adaptively, guaranteeing the QoS level with stability and accuracy. We evaluate the workloads and compare our gQoS strategy with other allocation strategies. The experiments show that our strategy guarantees much better accuracy and stability in QoS control and that the total GPU resource utilization under gQoS can be rewarded with at most a 25.85 percent reduction compared with other strategies.

[1]  Scott Shenker,et al.  Analysis and simulation of a fair queueing algorithm , 1989, SIGCOMM 1989.

[2]  Jiajun Wang,et al.  Boosting GPU Virtualization Performance with Hybrid Shadow Page Tables , 2015, USENIX Annual Technical Conference.

[3]  Chao Zhang,et al.  vGASA: Adaptive Scheduling Algorithm of Virtualized GPU Resource in Cloud Gaming , 2014, IEEE Transactions on Parallel and Distributed Systems.

[4]  Vanish Talwar,et al.  Pegasus: Coordinated Scheduling for Virtualized Accelerator-based Systems , 2011, USENIX ATC.

[5]  Cheol-Ho Hong,et al.  FairGV: Fair and Fast GPU Virtualization , 2017, IEEE Transactions on Parallel and Distributed Systems.

[6]  Vanish Talwar,et al.  GViM: GPU-accelerated virtual machines , 2009, HPCVirt '09.

[7]  Chuck Yoo,et al.  VADI: GPU Virtualization for an Automotive Platform , 2016, IEEE Transactions on Industrial Informatics.

[8]  Prashant J. Shenoy,et al.  Surplus fair scheduling: a proportional-share CPU scheduling algorithm for symmetric multiprocessors , 2000, OSDI.

[9]  Bingsheng He,et al.  gScale: Scaling up GPU Virtualization with Dynamic Sharing of Graphics Memory Space , 2016, USENIX Annual Technical Conference.

[10]  R. Shreedhar,et al.  Efficient Fair Queuing Using Deficit Round - , 1997 .

[11]  Giulio Giunta,et al.  A GPGPU Transparent Virtualization Component for High Performance Computing Clouds , 2010, Euro-Par.

[12]  George Varghese,et al.  Efficient fair queueing using deficit round-robin , 1996, TNET.

[13]  Kenli Li,et al.  vCUDA: GPU-Accelerated High-Performance Computing in Virtual Machines , 2012, IEEE Trans. Computers.

[14]  Bradford M. Beckmann,et al.  Oversubscribed Command Queues in GPUs , 2018, GPGPU@PPoPP.

[15]  Yue Zhao,et al.  EffiSha: A Software Framework for Enabling Effficient Preemptive Scheduling of GPU , 2017, PPoPP.

[16]  Michael F. P. O'Boyle,et al.  MaxPair: Enhance OpenCL Concurrent Kernel Execution by Weighted Maximum Matching , 2018, GPGPU@PPoPP.

[17]  Laxmi N. Bhuyan,et al.  Juggler: a dependence-aware task-based execution framework for GPUs , 2018, PPoPP.

[18]  Hans-Arno Jacobsen,et al.  Robust Multi-Resource Allocation with Demand Uncertainties in Cloud Scheduler , 2017, 2017 IEEE 36th Symposium on Reliable Distributed Systems (SRDS).

[19]  Yin Wang,et al.  VGRIS: Virtualized GPU Resource Isolation and Scheduling in Cloud Gaming , 2013, TACO.

[20]  Federico Silla,et al.  Enabling CUDA acceleration within virtual machines using rCUDA , 2011, 2011 18th International Conference on High Performance Computing.

[21]  Bingsheng He,et al.  Fairness-Efficiency Allocation of CPU-GPU Heterogeneous Resources , 2019, IEEE Transactions on Services Computing.

[22]  Yaozu Dong,et al.  A Full GPU Virtualization Solution with Mediated Pass-Through , 2014, USENIX Annual Technical Conference.

[23]  Klara Nahrstedt,et al.  Energy-efficient soft real-time CPU scheduling for mobile multimedia systems , 2003, SOSP '03.

[24]  Federico Silla,et al.  rCUDA: Reducing the number of GPU-based accelerators in high performance clusters , 2010, 2010 International Conference on High Performance Computing & Simulation.

[25]  Rami G. Melhem,et al.  Quality of service support for fine-grained sharing on GPUs , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).