Challenge benchmarks that must be conquered to sustain the gpu revolution

The shift from GPUs to GPGPUs has brought with it many changes to the GPU architecture (e.g. more caches, more concurrent kernels, better synchronization). As GPUs press further into the general-purpose domain, architects must continue to address the performance of challenging workloads. This paper presents a set of challenge benchmarks and their key performance limitations to help direct future GPU architecture research. Our study shows GPUs must develop multiple innovative architectural techniques to efficiently execute these applications to continue making inroads into general purpose computing.

[1]  Matei Ripeanu,et al.  Size Matters: Space/Time Tradeoffs to Improve GPGPU Applications Performance , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[2]  Henry Wong,et al.  Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[3]  Kevin Skadron,et al.  Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[4]  Christian Bienia,et al.  Benchmarking modern multiprocessors , 2011 .

[5]  Tor M. Aamodt,et al.  Thread block compaction for efficient SIMT control flow , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[6]  Collin McCurdy,et al.  The Scalable Heterogeneous Computing (SHOC) benchmark suite , 2010, GPGPU-3.

[7]  Hyesoon Kim,et al.  An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness , 2009, ISCA '09.