CTA-Aware Dynamic Scheduling Scheme for Streaming Multiprocessors in High-Performance GPUs

GPGPUs can provide powerful computational capability and are employed to execute both graphics and general-purpose applications. Hardware resource utilization is one of the most important factors in determining the GPGPU performance. For GPGPUs, multiple-application execution can increase the data parallelism, resulting in high resource utilization. However, applications have different execution time depending on their workload sizes. Therefore, if one application is completed earlier than the other ones, resource underutilization problem may happen because the hardware resource allocated for the early completed application become idle. In this work, a CTA-aware dynamic streaming multiprocessors scheduling scheme is proposed for multiple-application execution in the GPGPU to efficiently manage hardware resources. Compared to the baseline architecture, the proposed CTA-aware dynamic SM scheduling scheme can increase GPU performance by up to 25.6% on average.

[1]  Pradeep Dubey,et al.  Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU , 2010, ISCA.

[2]  Mahmut T. Kandemir,et al.  Application-aware Memory System for Fair and Efficient Execution of Concurrent GPGPU Applications , 2014, GPGPU@ASPLOS.

[3]  Vikas Agarwal,et al.  Clock rate versus IPC: the end of the road for conventional microarchitectures , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[4]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[5]  Nam Sung Kim,et al.  GPUWattch: enabling energy optimizations in GPGPUs , 2013, ISCA.

[6]  Henry Wong,et al.  Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[7]  J. E. Thornton,et al.  Parallel operation in the control data 6600 , 1964, AFIPS '64 (Fall, part II).

[8]  Pat Hanrahan,et al.  Brook for GPUs: stream computing on graphics hardware , 2004, SIGGRAPH 2004.

[9]  John D. Owens,et al.  General Purpose Computation on Graphics Hardware , 2005, IEEE Visualization.