SFFMap: Set-First Fill mapping for an energy efficient pipelined data cache

Conventionally, consecutively addressed blocks are mapped onto different sets in cache. In this work, we propose a new block address mapping, Set-First Fill (SFFMap), for pipelined L1 data caches wherein consecutively addressed data blocks are mapped onto the same set. This increases the inter-block spatial locality within the cache set. In order to exploit SFFMap, we propose to store and if possible, access the most recently used set in the cache's pipeline registers. Further, selective access (SSA) and selective update (SSU) techniques are proposed for set-buffer to increase the effectiveness of SFFMap. Our experimental evaluation for in-order and out-of-order processors with an 8-way set-associative data cache shows that SFFMap, together with SSA and SSU, achieves around 27% reduction in dynamic energy and 4-5% performance improvement. The proposed techniques need minor modifications to the existing hardware, making it an adoptable design.

[1]  Aneesh Aggarwal Reducing latencies of pipelined cache accesses through set prediction , 2005, ICS '05.

[2]  Xiaoning Ding,et al.  A buffer cache management scheme exploiting both temporal and spatial localities , 2007, TOS.

[3]  Trevor Mudge,et al.  Performance optimization of pipelined primary cache , 1992, ISCA '92.

[4]  Jason Cong,et al.  An energy-efficient adaptive hybrid cache , 2011, IEEE/ACM International Symposium on Low Power Electronics and Design.

[5]  Bruce Jacob,et al.  Memory Systems: Cache, DRAM, Disk , 2007 .

[6]  Kaushik Roy,et al.  Exploring high bandwidth pipelined cache architecture for scaled technology , 2003, 2003 Design, Automation and Test in Europe Conference and Exhibition.

[7]  Kanad Ghose,et al.  Reducing power in superscalar processor caches using subbanking, multiple line buffers and bit-line segmentation , 1999, Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477).

[8]  Rajesh K. Gupta,et al.  Design of a predictive filter cache for energy savings in high performance processor architectures , 2001, Proceedings 2001 IEEE International Conference on Computer Design: VLSI in Computers and Processors. ICCD 2001.

[9]  David B. Whalley,et al.  Speculative tag access for reduced energy dissipation in set-associative L1 data caches , 2013, 2013 IEEE 31st International Conference on Computer Design (ICCD).

[10]  T. N. Vijaykumar,et al.  Reactive-associative caches , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.

[11]  Chia-Lin Yang,et al.  HotSpot cache: joint temporal and spatial locality exploitation for I-cache energy reduction , 2004, Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758).

[12]  David B. Whalley,et al.  Designing a practical data filter cache to improve both energy efficiency and performance , 2013, ACM Trans. Archit. Code Optim..

[13]  Babak Falsafi,et al.  Multi-bit Error Tolerant Caches Using Two-Dimensional Error Coding , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[14]  Dirk Grunwald,et al.  Next cache line and set prediction , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[15]  Alexander V. Veidenbaum,et al.  Fast Speculative Address Generation and Way Caching for Reducing L1 Data Cache Energy , 2006, 2006 International Conference on Computer Design.

[16]  William H. Mangione-Smith,et al.  The filter cache: an energy efficient memory structure , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[17]  Anant Agarwal,et al.  Column-associative caches: a technique for reducing the miss rate of direct-mapped caches , 1993, ISCA '93.

[18]  Alexander V. Veidenbaum,et al.  Revisiting level-0 caches in embedded processors , 2012, CASES '12.

[19]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[20]  Babak Falsafi,et al.  Using dead blocks as a virtual victim cache , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[21]  Gary S. Tyson,et al.  Guaranteeing Hits to Improve the Efficiency of a Small Instruction Cache , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[22]  David Black-Schaffer,et al.  TLC: A tag-less cache for reducing dynamic first level cache energy , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[23]  Eric Rotenberg,et al.  Trace cache: a low latency approach to high bandwidth instruction fetching , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[24]  Ikuya Kawasaki,et al.  SH3: high code density, low power , 1995, IEEE Micro.

[25]  Kazuaki Murakami,et al.  Way-predicting set-associative cache for high performance and low energy consumption , 1999, Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477).

[26]  Madhu Mutyam,et al.  Word-interleaved cache: an energy efficient data cache architecture , 2008, Proceeding of the 13th international symposium on Low power electronics and design (ISLPED '08).

[27]  Alvin M. Despain,et al.  Cache design trade-offs for power and performance optimization: a case study , 1995, ISLPED '95.

[28]  Norman P. Jouppi,et al.  Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[29]  Sanjeev Kumar,et al.  Exploiting spatial locality in data caches using spatial footprints , 1998, ISCA.

[30]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[31]  Chenxi Zhang,et al.  Two fast and high-associativity cache schemes , 1997, IEEE Micro.

[32]  David Kroft,et al.  Lockup-free instruction fetch/prefetch cache organization , 1998, ISCA '81.