Variation-tolerant cache by two-layer error control codes

In this paper, we explore a two-layer error control codes (ECC), which combines rectangular and Hamming product codes in an efficient way to address process and supply voltage variation in cache. Two-layer ECC employs simple rectangular codes for each cache line to detect error, while loading extra Hamming product codes check bits in the case of error detection; thus enabling process and supply voltage variation-tolerant cache design. Our analysis and experimental results shows that compared to complex 4-way 4EC5ED, two-layer ECC can increase Mean-Error-To-Failure by more than 2×, improve reliability by two order of magnitude under process variation, and reduce residual failure rate by one order of magnitude under supply voltage variation. Compared to simple 8-way SECDED, two-layer ECC shows a 28x-133x improvement in METF, and residual failure rate are improved furthermore.

[1]  Soontae Kim,et al.  SimTag: Exploiting tag bits similarity to improve the reliability of the data caches , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[2]  Shuai Wang,et al.  Replicating Tag Entries for Reliability Enhancement in Cache Tag Arrays , 2012, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[3]  Hyunjin Lee,et al.  Performance of Graceful Degradation for Cache Faults , 2007, IEEE Computer Society Annual Symposium on VLSI (ISVLSI '07).

[4]  S. E. Schuster Multiple word/bit line redundancy for semiconductor memories , 1978 .

[5]  Jaume Abella,et al.  Low Vccmin fault-tolerant cache with highly predictable performance , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[6]  Bo Fu,et al.  On Hamming Product Codes With Type-II Hybrid ARQ for On-Chip Interconnects , 2009, IEEE Transactions on Circuits and Systems I: Regular Papers.

[7]  A.P. Chandrakasan,et al.  A 256-kb 65-nm Sub-threshold SRAM Design for Ultra-Low-Voltage Operation , 2007, IEEE Journal of Solid-State Circuits.

[8]  Wei Wu,et al.  Energy-efficient cache design using variable-strength error-correcting codes , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[9]  Doe Hyun Yoon,et al.  Memory mapped ECC: low-cost error protection for last level caches , 2009, ISCA '09.

[10]  Wei Wu,et al.  Reducing cache power with low-cost, multi-bit error-correcting codes , 2010, ISCA.

[11]  Michael S. Floyd,et al.  Fault - tolerant design of the IBM POWER6™ microprocessor , 2007, 2007 IEEE Hot Chips 19 Symposium (HCS).

[12]  Shekhar Y. Borkar,et al.  Designing reliable systems from unreliable components: the challenges of transistor variability and degradation , 2005, IEEE Micro.

[13]  Chris Wilkerson,et al.  Architectural-level error-tolerant techniques for low supply voltage cache operation , 2011, 2011 IEEE International Conference on IC Design & Technology.

[14]  Shuai Wang,et al.  Characterizing System-Level Vulnerability for Instruction Caches against Soft Errors , 2011, 2011 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems.

[15]  Paul Ampadu,et al.  Breaking the energy Barrier in fault-tolerant caches for multicore systems , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[16]  Wei Wu,et al.  Improving cache lifetime reliability at ultra-low voltages , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[17]  Borivoje Nikolic,et al.  Large-Scale SRAM Variability Characterization in 45 nm CMOS , 2009, IEEE Journal of Solid-State Circuits.

[18]  Anna W. Topol,et al.  Stable SRAM cell design for the 32 nm node and beyond , 2005, Digest of Technical Papers. 2005 Symposium on VLSI Technology, 2005..

[19]  J. Jopling,et al.  Erratic fluctuations of sram cache vmin at the 90nm process technology node , 2005, IEEE InternationalElectron Devices Meeting, 2005. IEDM Technical Digest..

[20]  Ram Huggahalli,et al.  Impact of Cache Coherence Protocols on the Processing of Network Traffic , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[21]  Babak Falsafi,et al.  Multi-bit Error Tolerant Caches Using Two-Dimensional Error Coding , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[22]  Kaushik Roy,et al.  A process-tolerant cache architecture for improved yield in nanoscale technologies , 2005, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[23]  M. Horiguchi,et al.  Redundancy techniques for high-density DRAMs , 1997, 1997 Proceedings Second Annual IEEE International Conference on Innovative Systems in Silicon.

[24]  Chen Sun,et al.  Cross-layer Energy and Performance Evaluation of a Nanophotonic Manycore Processor System Using Real Application Workloads , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.