Application Characteristics-Aware Sporadic Cache Bypassing for high performance GPGPUs

Abstract Modern graphics processing units (GPUs) with massive parallel architecture can boost the performance of both graphics and general-purpose applications. With the support of new programming tools, GPUs have become one of the most attractive platforms in the exploitation of the high thread-level parallelism. In the recent GPUs, hierarchical cache memories have been employed to support irregular memory-access patterns. However, the L1 data cache exhibits a poor efficiency in GPUs, and this is mainly due to the cache contention and the resource congestion. This paper shows that the L1 data cache does not always positively impact applications in terms of the performance; in fact, many applications are even slowed down due to the use of the L1 data cache. In this paper, a novel cache bypassing mechanism (CARB) is proposed to increase the efficiency of the GPU cache management and to improve the GPU performance. The CARB mechanism exploits the characteristics of the currently executed applications to estimate the performance impact of the L1 data cache on the GPU, and it then allows memory requests to bypass the cache in discrete phases during the execution time. The bypassing decision is determined adaptively at runtime. Experiment results show that the CARB mechanism achieves an average speedup of 22% for a wide range of GPGPU applications.

[1]  Xinxin Mei,et al.  Dissecting GPU Memory Hierarchy Through Microbenchmarking , 2015, IEEE Transactions on Parallel and Distributed Systems.

[2]  Christian Enz,et al.  A BiCMOS programmable continuous-time filter using image-parameter method synthesis and voltage-companding technique , 1997, IEEE J. Solid State Circuits.

[3]  Mazen Kharbutli,et al.  LACS: A Locality-Aware Cost-Sensitive Cache Replacement Algorithm , 2014, IEEE Transactions on Computers.

[4]  Laszlo A. Belady,et al.  A Study of Replacement Algorithms for Virtual-Storage Computer , 1966, IBM Syst. J..

[5]  Yan Solihin,et al.  Counter-Based Cache Replacement and Bypassing Algorithms , 2008, IEEE Transactions on Computers.

[6]  Michael J. Flynn,et al.  Very high-speed computing systems , 1966 .

[7]  Jong-Myon Kim,et al.  A new cache replacement algorithm for last-level caches by exploiting tag-distance correlation of cache lines , 2015, Microprocess. Microsystems.

[8]  Michel Dubois,et al.  Cache replacement algorithms with nonuniform miss costs , 2006, IEEE Transactions on Computers.

[9]  Naga K. Govindaraju,et al.  A Survey of General‐Purpose Computation on Graphics Hardware , 2007 .

[10]  Norman P. Jouppi,et al.  CACTI: an enhanced cache access and cycle time model , 1996, IEEE J. Solid State Circuits.

[11]  Menglong Guan,et al.  Exploiting Early Tag Access for Reducing L1 Data Cache Energy in Embedded Processors , 2014, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.