Cache Power Reduction in Presence of Within-Die Delay Variation Using Spare Ways

The share of leakage in cache power consumption increases with technology scaling. Choosing a higher threshold voltage (Vth) and/or gate-oxide thickness(Tox) for cache transistors improves leakage, but impacts cell delay. We show that due to uncorrelated random within-die delay variation, only some (not all)of cells actually violate the cache delay after the above change. We propose to add a spare cache way to replace delay-violating cache-lines separately in each cache-set. By SPICE and gate-level simulations in a commercial 90 nm process, we show that choosing higher Vth, Tox and adding one spare way to a 4-way 16 KB cache reduces leakage power by 42%, which depending on the share of leakage in total cache power, gives up to 22.59% and 41.37% reduction of total energy respectively in L1 instruction- and L2 unified-cache with a negligible delay penalty, but without sacrificing cache capacity or timing-yield.

[1]  David H. Albonesi,et al.  Selective cache ways: on-demand cache resource allocation , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[2]  Hai Zhou,et al.  Yield-Aware Cache Architectures , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[3]  Koji Inoue,et al.  A low-power I-cache design with tag-comparison reuse , 2004, 2004 International Symposium on System-on-Chip, 2004. Proceedings..

[4]  Koji Nii,et al.  Worst-case analysis to obtain stable read/write DC margin of high density 6T-SRAM-array with local Vth variability , 2005, ICCAD-2005. IEEE/ACM International Conference on Computer-Aided Design, 2005..

[5]  Edward J. McCluskey,et al.  PADded cache: a new fault-tolerance technique for cache memories , 1999, Proceedings 17th IEEE VLSI Test Symposium (Cat. No.PR00146).

[6]  Vasily G. Moshnyaga,et al.  Low-Power Cache Design , 2018, Low-Power Processors and Systems on Chips.

[7]  Tohru Ishihara,et al.  A cache-defect-aware code placement algorithm for improving the performance of processors , 2005, ICCAD-2005. IEEE/ACM International Conference on Computer-Aided Design, 2005..

[8]  Kaushik Roy,et al.  A process-tolerant cache architecture for improved yield in nanoscale technologies , 2005, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[9]  David A. Rennels,et al.  Reducing the frequency of tag compares for low power I-cache design , 1995, ISLPED '95.

[10]  Ke Meng,et al.  Process Variation Aware Cache Leakage Management , 2006, ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design.

[11]  Peter Y. K. Cheung,et al.  Within-die delay variability in 90nm FPGAs and beyond , 2006, 2006 IEEE International Conference on Field Programmable Technology.