Due to increased computing power and flexibility of GPU, recent GPUs execute general purpose parallel applications as well as graphics applications. Programmers can use GPGPU by using the APIs from GPU vendors. Unfortunately, computational resources of GPU are not fully utilized when executing general purpose applications because of frequent branch instructions. To handle the branch problem, several warp formations have been proposed. Intuitively, we expect that the warp formations providing higher computational resource utilization show higher performance. Contrary to our expectations, according to simulation results, the performance of the warp formation providing better utilization is lower than that of the warp formation providing worse utilization. This is because warp formation providing high utilization causes serious memory bottleneck due to increased memory request. Therefore, warp formation providing high computation utilization cannot guarantee high performance without proper hardware resources. For this reason, we will analyze the correlation between hardware configuration and warp formation. Our simulation results present the guideline to solve the underutilization problem due to branch instructions when designing recent GPU.
[1]
Michael J. Flynn,et al.
Very high-speed computing systems
,
1966
.
[2]
Cheol-Hong Kim,et al.
Performance Evaluation of the GPU Architecture Executing Parallel Applications
,
2012
.
[3]
Kwan-Hee Yoo,et al.
Parallel Processing for Integral Imaging Pickup Using Multiple Threads
,
2009
.
[4]
Kevin Skadron,et al.
A performance study of general-purpose applications on graphics processors using CUDA
,
2008,
J. Parallel Distributed Comput..
[5]
Jong-Myon Kim,et al.
Analysis of the CPU/GPU Temperature and Energy Efficiency depending on Executed Applications
,
2012
.