QoS-aware dynamic resource allocation for spatial-multitasking GPUs

General-purpose computing on GPUs (GPGPU computing) is becoming widely adopted; however, some GPGPU applications fail to fully utilize GPU resources. In these cases, spatial multitasking better exploits the parallelism offered by GPUs by partitioning the GPU resources among simultaneously-running applications. When one or more such applications have quality-of-service (QoS) requirements, enough resources must be allocated for those applications to satisfy their requirements. Remaining resources can be either disabled to reduce power consumption or used to accelerate other applications. However, we observe that the amount of resources for a QoS application to satisfy its performance requirement is dependent in part upon the co-executing applications. In this paper, we propose a runtime technique to dynamically partition GPU resources between concurrently running applications - at least one of which has a QoS requirement. We demonstrate that the proposed technique can satisfy a 100% QoS requirement while also achieving either a 7W power consumption reduction or a 17.57% performance improvement for co-executing best-effort applications.

[1]  Christian Poellabauer,et al.  Feedback-based dynamic voltage and frequency scaling for memory-bound real-time applications , 2005, 11th IEEE Real Time and Embedded Technology and Applications Symposium.

[2]  Hyesoon Kim,et al.  An integrated GPU power and performance model , 2010, ISCA.

[3]  Kevin Jeffay,et al.  Kernel Support for Live Digital Audio and Video , 1991, NOSSDAV.

[4]  Nam Sung Kim,et al.  The case for GPGPU spatial multitasking , 2012, IEEE International Symposium on High-Performance Comp Architecture.

[5]  Scott Shenker,et al.  Supporting real-time applications in an Integrated Services Packet Network: architecture and mechanism , 1992, SIGCOMM '92.

[6]  Nam Sung Kim,et al.  GPUWattch: enabling energy optimizations in GPGPUs , 2013, ISCA.

[7]  Shinpei Kato,et al.  TimeGraph: GPU Scheduling for Real-Time Multi-Tasking Environments , 2011, USENIX Annual Technical Conference.

[8]  Henry Wong,et al.  Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[9]  Yukikazu Nakamoto,et al.  Adaptive Resource Allocation Control for Fair QoS Management , 2007, IEEE Transactions on Computers.

[10]  Kyle Rupnow,et al.  Performance metrics for hybrid multi-tasking systems , 2009, 2009 International Conference on Field Programmable Logic and Applications.