Hardware-Based and Hybrid L1 Data Cache Bypassing to Improve GPU Performance

Intelligent GPU cache bypassing can improve the efficiency of using GPU memory bandwidth, which can benefit GPU performance. In this paper, we study a pure hardware-based GPU cache bypassing method that can be applied to GPU applications without having to recompile the programs. Moreover, we introduce a hybrid method that can exploit profiling information to further enhance the hardware-based bypassing. Our experimental results show that the hardware-based cache bypassing can improve performance for most benchmarks, and the hybrid method can achieve performance comparable to the state-of-the-art compiler-based bypassing with much less profiling cost.

[1]  Margaret Martonosi,et al.  Characterizing and improving the use of demand-fetched caches in GPUs , 2012, ICS '12.

[2]  Yun Liang,et al.  An efficient compiler framework for cache bypassing on GPUs , 2013, 2013 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[3]  Kevin Skadron,et al.  Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[4]  Henry Wong,et al.  Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.