CIFR: A complete in-place fault remapping strategy for CMP cache using dynamic reuse distance

Dynamic voltage and frequency scaling puts threats to reliability in Chip Multiprocessors (CMPs). Cache being the most susceptible to faults, the fault tolerance techniques are necessary to ensure error free execution even if there are faults in cache. Existing fault tolerance techniques lack completeness in fault protection as well as harm effective capacity of the cache. They either remap faulty blocks to non-conflicting faulty blocks or use some auxiliary cache. This work proposes a fault remapping strategy that ensures completeness in fault protection without affecting the effective capacity of the Last Level Cache by remapping all effective faulty cache lines to either non-conflicting faulty cache lines or low-reusable healthy lines. The reusability is predicted using dynamic reuse distance analysis and cache lines are ranked by their protecting distance. Only the highly reusable faulty lines are considered for remapping to low reusable non-conflicting faulty lines. Failing that the low-reusable healthy lines are considered as the target and this avoids the requirement of any auxiliary cache. Cycle accurate simulation in Multi2Sim 5.0 with plethora of fault maps, in an octacore CMP architecture, reveals up to 38.73% increase in hit ratio over the existing fault remapping techniques.

[1]  Haridimos T. Vergos,et al.  Performance recovery in direct-mapped faulty caches via the use of a very small fully associative spare cache , 1995, Proceedings of 1995 IEEE International Computer Performance and Dependability Symposium.

[2]  T. Skotnicki,et al.  The end of CMOS scaling: toward the introduction of new materials and structural changes to improve MOSFET performance , 2005, IEEE Circuits and Devices Magazine.

[3]  Nikil D. Dutt,et al.  Using a Flexible Fault-Tolerant Cache to Improve Reliability for Ultra Low Voltage Operation , 2015, TECS.

[4]  Daniel J. Costello,et al.  Error Control Coding, Second Edition , 2004 .

[5]  Nikil D. Dutt,et al.  REMEDIATE: A scalable fault-tolerant architecture for low-power NUCA cache in tiled CMPs , 2013, 2013 International Green Computing Conference Proceedings.

[6]  Georgios Keramidas,et al.  Spatial pattern prediction based management of faulty data caches , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[7]  Gurindar S. Sohi Cache Memory Organization to Enhance the Yield of High-Performance VLSI Processors , 1989, IEEE Trans. Computers.

[8]  Mateo Valero,et al.  Improving Cache Management Policies Using Dynamic Reuse Distances , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[9]  Farshad Moradi,et al.  65NM sub-threshold 11T-SRAM for ultra low voltage applications , 2008, 2008 IEEE International SOC Conference.

[10]  Mark D. Hill,et al.  Performance Implications of Tolerating Cache Faults , 1993, IEEE Trans. Computers.

[11]  Yiannakis Sazeides,et al.  Performance-effective operation below Vcc-min , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).

[12]  Amin Ansari,et al.  Enabling ultra low voltage system operation by tolerating on-chip cache failures , 2009, ISLPED.

[13]  Hai Zhou,et al.  Yield-Aware Cache Architectures , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).