Supplementary Materials to Adaptive and Transparent Cache Bypassing for GPUs

This document is the supplementary supporting le to the corresponding SC-15 conference paper titled Adaptive and Transparent Cache Bypassing for GPUs. In this document, we rst show the experiment gures for the four extra GPU platforms that cannot t into the original paper due to page limitation. We then show the simulation results for the hardware approach that attempts to reduce bypass overhead. Finally, we analyze the performance patterns of the applications with respect to dierent bypassing threshold, which may explain why certain applications can benet signicantly from cache bypassing than others.

[1]  Kevin Skadron,et al.  Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[2]  Henk Corporaal,et al.  A detailed GPU cache model based on reuse distance theory , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[3]  Nam Sung Kim,et al.  GPUWattch: enabling energy optimizations in GPGPUs , 2013, ISCA.

[4]  Naga K. Govindaraju,et al.  Mars: A MapReduce Framework on graphics processors , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[5]  Lifan Xu,et al.  Auto-tuning a high-level language targeted to GPU codes , 2012, 2012 Innovative Parallel Computing (InPar).

[6]  Wen-mei W. Hwu,et al.  Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing , 2012 .

[7]  Henry Wong,et al.  Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[8]  Richard W. Vuduc,et al.  Many-Thread Aware Prefetching Mechanisms for GPGPU Applications , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.