Fine-Grained Fault Tolerance for Process Variation-Aware Caches

Continuous scaling in CMOS fabrication process makes circuits more vulnerable to process variations, which results in variable delay, malfunctioning, and/or leaky circuits. Caches are one of the biggest victims of process variations due to their large sizes and minimal cell features. To mitigate the impacts of process variations on caches, we propose to localize the effects of process variations at a word level, not at the conventional cache set, cache way, or cache line level. Faulty words are disabled or shut down completely and accesses to those words are bypassed to a small set of word-length buffers. This technique is shown to be effective in reducing performance penalty due to process variations and in increasing the parametric yield up to 90% when subjected to the performance constraints.

[1]  Kaushik Roy,et al.  Modeling of failure probability and statistical design of SRAM array for yield enhancement in nanoscaled CMOS , 2005, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[2]  Madhu Mutyam,et al.  Block remap with turnoff: A variation-tolerant cache design technique , 2008, 2008 Asia and South Pacific Design Automation Conference.

[3]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[4]  Alaa R. Alameldeen,et al.  Trading off Cache Capacity for Reliability to Enable Low Voltage Operation , 2008, 2008 International Symposium on Computer Architecture.

[5]  Margaret Martonosi,et al.  XTREM: a power simulator for the Intel XScale® core , 2004, LCTES '04.

[6]  Trevor Mudge,et al.  On-Chip Cache Device Scaling Limits and Effective Fault Repair Techniques in Future Nanoscale Technology , 2007 .

[7]  Wei Chen,et al.  The 65-nm 16-MB Shared On-Die L3 Cache for the Dual-Core Intel Xeon Processor 7100 Series , 2007, IEEE Journal of Solid-State Circuits.

[8]  Sang Lyul Min,et al.  Languages, Compilers, and Tools for Embedded Systems , 2001, Lecture Notes in Computer Science.

[9]  Hyunjin Lee,et al.  Performance of Graceful Degradation for Cache Faults , 2007, IEEE Computer Society Annual Symposium on VLSI (ISVLSI '07).

[10]  Wei Zhang,et al.  Enhancing data cache reliability by the addition of a small fully-associative replication cache , 2004, ICS '04.

[11]  Yiannakis Sazeides,et al.  Performance-effective operation below Vcc-min , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).

[12]  Kaushik Roy,et al.  Gated-Vdd: a circuit technique to reduce leakage in deep-submicron cache memories , 2000, ISLPED '00.

[13]  Kaushik Roy,et al.  A process-tolerant cache architecture for improved yield in nanoscale technologies , 2005, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[14]  J. Meindl,et al.  The impact of intrinsic device fluctuations on CMOS SRAM cell stability , 2001, IEEE J. Solid State Circuits.