Characterizing the performance benefits of fused CPU/GPU systems using FusionSim
暂无分享,去创建一个
We use FusionSim to characterize the performance of the Rodinia benchmarks on fused and discrete systems. We demonstrate that the speed-up due to fusion is highly correlated with the input data size. We demonstrate that for benchmarks that benefit most from fusion, a 9.72x speed up is possible for small problem sizes. This speedup reduces to 1.84x with medium or large problem sizes. We study a simple, software-managed coherence solution for the fused system. We find that it imposes a minor performance overhead of 2% for most benchmarks and as high as 5% for some. Finally, we develop an analytical model for the performance benefit that is to be expected from fusion for applications with a simple communication and computation pattern and show that FusionSim follows the predicted performance trend.
[1] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[2] Henry Wong,et al. Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.