TLC: A tag-less cache for reducing dynamic first level cache energy

First level caches are performance-critical and are therefore optimized for speed. To do so, modern processors reduce the miss ratio by using set-associative caches and optimize latency by reading all ways in parallel with the TLB and tag lookup. However, this wastes energy since only data from one way is actually used. To reduce energy, phased-caches and way-prediction techniques have been proposed wherein only data of the matching/predicted way is read. These optimizations increase latency and complexity, making them less attractive for first level caches. Instead of adding new functionality on top of a traditional cache, we propose a new cache design that adds way index information to the TLB. This allow us to: 1) eliminate extra data array reads (by reading the right way directly), 2) avoid tag comparisons (by eliminating the tag array), 3) filter out misses (by checking the TLB), and 4) amortize the TLB lookup energy (by integrating it with the way information). In addition, the new cache can directly replace existing caches without any modification to the processor core or software. This new Tag-Less Cache (TLC) reduces the dynamic energy for a 32 kB, 8-way cache by 78% compared to a VIPT cache without affecting performance.

[1]  Margaret Martonosi,et al.  Computer Architecture Techniques for Power-Efficiency , 2008, Computer Architecture Techniques for Power-Efficiency.

[2]  David Eklov,et al.  Efficient software-based online phase classification , 2011, 2011 IEEE International Symposium on Workload Characterization (IISWC).

[3]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[4]  Kaushik Roy,et al.  Reducing set-associative cache energy via way-prediction and selective direct-mapping , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.

[5]  Andreas Moshovos,et al.  A Framework for Coarse-Grain Optimizations in the On-Chip Memory Hierarchy , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[6]  Norman P. Jouppi,et al.  CACTI 6.0: A Tool to Model Large Caches , 2009 .

[7]  Yale N. Patt,et al.  The V-Way cache: demand-based associativity via global replacement , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[8]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[9]  T. N. Vijaykumar,et al.  Reactive-associative caches , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.

[10]  Chenxi Zhang,et al.  Two fast and high-associativity cache schemes , 1997, IEEE Micro.

[11]  Michael M. Swift,et al.  Reducing memory reference energy with opportunistic virtual caching , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[12]  André Seznec,et al.  A case for two-way skewed-associative caches , 1993, ISCA '93.

[13]  Christoforos E. Kozyrakis,et al.  The ZCache: Decoupling Ways and Associativity , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[14]  Michel Dubois,et al.  The Synonym Lookaside Buffer: A Solution to the Synonym Problem in Virtual Caches , 2008, IEEE Transactions on Computers.

[15]  John L. Henning SPEC CPU2006 benchmark descriptions , 2006, CARN.

[16]  Dirk Grunwald,et al.  Predictive sequential associative cache , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.

[17]  Steven K. Reinhardt,et al.  A fully associative software-managed cache design , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[18]  Brad Calder,et al.  Automatically characterizing large scale program behavior , 2002, ASPLOS X.

[19]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).