Exploiting Spatial and Temporal Patterns in a High-Performance CPU

In modern computer systems, the effect known as the memory gap has become a serious bottleneck. It is becoming increasingly difficult to bridge this gap with traditional solutions, and much effort is put into developing new and more effective solutions to this problem. An earlier design, the Dual Data Cache (DDC), is a cache design that implies separation of data into two different cache subsystems so as to increase effectiveness of the cache. Data are separated accordingly to their predominant type of locality. The modified DDC, described here, introduces different internal organizations of the temporal and spatial parts, for better utilization of data characteristics. Conducted simulations show substantial improvements over traditional cache systems, with little increase in surface area and power consumption.

[1]  Sally A. McKee,et al.  Hitting the memory wall: implications of the obvious , 1995, CARN.

[2]  Mikko H. Lipasti,et al.  Exceeding the dataflow limit via value prediction , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[3]  Josep Torrellas,et al.  Comparing data forwarding and prefetching for communication-induced misses in shared-memory MPs , 1998, ICS '98.

[4]  Dror G. Feitelson,et al.  Exploiting Core Working Sets to Filter the L1 Cache with Random Sampling , 2012, IEEE Transactions on Computers.

[5]  Florian Schintke,et al.  A Cache Simulator for Shared Memory Systems , 2001, International Conference on Computational Science.

[6]  Antonio Gonzalez,et al.  A data cache with multiple caching strategies tuned to different types of locality , 1995, International Conference on Supercomputing.

[7]  V. Milutinovic,et al.  A new cache architecture concept: the split temporal/spatial cache , 1996, Proceedings of 8th Mediterranean Electrotechnical Conference on Industrial Applications in Power Systems, Computer Science and Telecommunications (MELECON 96).

[8]  Daniel J. Sorin,et al.  Evaluating cache coherent shared virtual memory for heterogeneous multicore chips , 2013, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[9]  Michel Dubois,et al.  Self-correcting LRU replacement policies , 2004, CF '04.

[10]  Yuan Xie Future memory and interconnect technologies , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[11]  Sally A. McKee,et al.  Reflections on the memory wall , 2004, CF '04.

[12]  Sandhya Dwarkadas,et al.  Amoeba-Cache: Adaptive Blocks for Eliminating Waste in the Memory Hierarchy , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[13]  Ronald G. Dreslinski,et al.  The M5 Simulator: Modeling Networked Systems , 2006, IEEE Micro.

[14]  Alan Jay Smith,et al.  Line (Block) Size Choice for CPU Cache Memories , 1987, IEEE Transactions on Computers.