FlexGPU: A Flexible and Efficient Scheduler for GPU Sharing Systems

The graphics processing unit (GPU) is extensively used in diverse domains, such as finance, machine learning, and image processing. The GPU can be underutilized as multiple applications may not share the same GPU concurrently owing to a memory oversubscription issue. For example, when applications that require fewer computational resources but a larger GPU memory are running instantaneously, the GPU memory may be insufficient; consequently, the number of GPU applications running simultaneously is restricted, decreasing GPU utilization. Further, it can even stop the execution of applications that are running on the GPU. To this end, we propose FlexGPU, which schedules the kernels of the GPU applications that run on the same GPU according to their features. This framework 1) schedules the kernel at the launching time according to its features to improve GPU utilization and 2) temporarily checkpoints and restores non-dependent content in the GPU memory to/from the host memory, which avoids oversubscription of the GPU when out-of-memory failure occurs and allows more kernels to run concurrently on the GPU. The experimental results show that compared to existing methods, our approach demonstrates a 7 times improvement in performance in terms of execution time and enables a 2.5 times increase in the concurrent execution of applications.

[1]  Depei Qian,et al.  SMGuard: A Flexible and Fine-Grained Resource Management Framework for GPUs , 2018, IEEE Transactions on Parallel and Distributed Systems.

[2]  Amitabh Varshney,et al.  High-throughput sequence alignment using Graphics Processing Units , 2007, BMC Bioinformatics.

[3]  Scott A. Mahlke,et al.  Chimera: Collaborative Preemption for Multitasking on a Shared GPU , 2015, ASPLOS.

[4]  R. Govindarajan,et al.  Improving GPGPU concurrency with elastic kernels , 2013, ASPLOS '13.

[5]  M. Bozyigit,et al.  User-level process checkpoint and restore for migration , 2001, OPSR.

[6]  Rami G. Melhem,et al.  Simultaneous Multikernel GPU: Multi-tasking throughput processors via fine-grained sharing , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[7]  Wei Jiang,et al.  Scheduling concurrent applications on a cluster of CPU-GPU nodes , 2013, Future Gener. Comput. Syst..

[8]  Long Chen,et al.  Exploring Fine-Grained Task-Based Execution on Multi-GPU Systems , 2011, 2011 IEEE International Conference on Cluster Computing.

[9]  Scott A. Mahlke,et al.  Dynamic Resource Management for Efficient Utilization of Multitasking GPUs , 2017, ASPLOS.

[10]  Arie E. Kaufman,et al.  GPU Cluster for High Performance Computing , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[11]  Kenli Li,et al.  vCUDA: GPU-Accelerated High-Performance Computing in Virtual Machines , 2012, IEEE Trans. Computers.

[12]  Jaewook Kim,et al.  ConVGPU: GPU Management Middleware in Container Based Virtualized Environment , 2017, 2017 IEEE International Conference on Cluster Computing (CLUSTER).

[13]  Lin Shi,et al.  vCUDA: GPU accelerated high performance computing in virtual machines , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[14]  Orran Krieger,et al.  Virtualization for high-performance computing , 2006, OPSR.

[15]  Xizhou Feng,et al.  Slate: Enabling Workload-Aware Efficient Multiprocessing for Modern GPGPUs , 2019, 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[16]  Federico Silla,et al.  rCUDA: Reducing the number of GPU-based accelerators in high performance clusters , 2010, 2010 International Conference on High Performance Computing & Simulation.

[17]  Ester M. Garzón,et al.  Dynamic Load Scheduling on CPU-GPU for Iterative Tomographic Reconstruction , 2012, 2012 IEEE 10th International Symposium on Parallel and Distributed Processing with Applications.

[18]  Jing Gu,et al.  GaiaGPU: Sharing GPUs in Container Clouds , 2018, 2018 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom).

[19]  Blesson Varghese,et al.  Accelerator Virtualization in Fog Computing: Moving from the Cloud to the Edge , 2018, IEEE Cloud Computing.