Measuring the Impact of Conguration Parameters

The threadblock size and shape choice is one of the most important user decisions when a parallel problem is coded to run in GPU architectures. In fact, threadblock conguration has a signicant

[1]  Hyesoon Kim,et al.  An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness , 2009, ISCA '09.

[2]  Arturo González-Escribano,et al.  Using Fermi Architecture Knowledge to Speed up CUDA and OpenCL Programs , 2012, 2012 IEEE 10th International Symposium on Parallel and Distributed Processing with Applications.

[3]  Yao Zhang,et al.  A quantitative performance analysis model for GPU architectures , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[4]  Yuri Torres,et al.  Understanding the impact of CUDA tuning techniques for Fermi , 2011, 2011 International Conference on High Performance Computing & Simulation.

[5]  Andreas Moshovos,et al.  Demystifying GPU microarchitecture through microbenchmarking , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).

[6]  Xiaoming Li,et al.  A Micro-benchmark Suite for AMD GPUs , 2010, 2010 39th International Conference on Parallel Processing Workshops.