Reducing dynamic energy of set-associative L1 instruction cache by early tag lookup

To minimize the access latency of set-associative caches, the data in all ways are read out in parallel with the tag lookup. However, this is energy inefficient, as only the data from the matching way is used and the others are discarded. This paper proposes an early tag lookup (ETL) technique for L1 instruction caches that determines the matching way one cycle earlier than the cache access, so that only the matching data way need be accessed. ETL incurs no performance penalty and insignificant hardware overhead. Evaluation on a 4-way set-associative L1 instruction cache in 45nm technology shows that ETL reduces the read energy by 68% on average.

[1]  Rajesh K. Gupta,et al.  Integrated I-cache Way Predictor and Branch Target Buffer to Reduce Energy Consumption , 2002, ISHPC.

[2]  Yale N. Patt,et al.  Increasing the instruction fetch rate via multiple branch prediction and a branch address cache , 1993, ICS '93.

[3]  Kaushik Roy,et al.  Reducing set-associative cache energy via way-prediction and selective direct-mapping , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.

[4]  Chia-Lin Yang,et al.  HotSpot cache: joint temporal and spatial locality exploitation for I-cache energy reduction , 2004, Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758).

[5]  T. N. Vijaykumar,et al.  Reactive-associative caches , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.

[6]  Margaret Martonosi,et al.  Wattch: a framework for architectural-level power analysis and optimizations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[7]  Brad Calder,et al.  Automatically characterizing large scale program behavior , 2002, ASPLOS X.

[8]  Paul D. Franzon,et al.  FreePDK: An Open-Source Variation-Aware Design Kit , 2007, 2007 IEEE International Conference on Microelectronic Systems Education (MSE'07).

[9]  Kazuaki Murakami,et al.  Way-predicting set-associative cache for high performance and low energy consumption , 1999, Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477).

[10]  David B. Whalley,et al.  Speculative tag access for reduced energy dissipation in set-associative L1 data caches , 2013, 2013 IEEE 31st International Conference on Computer Design (ICCD).

[11]  Dirk Grunwald,et al.  Predictive sequential associative cache , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.

[12]  Frank Vahid,et al.  A way-halting cache for low-energy high-performance systems , 2004, Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758).

[13]  Alexander V. Veidenbaum,et al.  Reducing power consumption for high-associativity data caches in embedded processors , 2003, 2003 Design, Automation and Test in Europe Conference and Exhibition.

[14]  Ibrahim N. Hajj,et al.  Architectural and compiler support for energy reduction in the memory hierarchy of high performance microprocessors , 1998, Proceedings. 1998 International Symposium on Low Power Electronics and Design (IEEE Cat. No.98TH8379).

[15]  William H. Mangione-Smith,et al.  The filter cache: an energy efficient memory structure , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[16]  Menglong Guan,et al.  Exploiting Early Tag Access for Reducing L1 Data Cache Energy in Embedded Processors , 2014, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[17]  David B. Whalley,et al.  Reducing set-associative L1 data cache energy by early load data dependence detection (ELD3) , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[18]  Eric Rotenberg,et al.  FabScalar: Composing synthesizable RTL designs of arbitrary cores within a canonical superscalar template , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[19]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[20]  David Black-Schaffer,et al.  TLC: A tag-less cache for reducing dynamic first level cache energy , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[21]  Mikko H. Lipasti,et al.  Tag Check Elision , 2014, 2014 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).