Soft Error Benchmarking for L2 Cache with PARMA

Low voltage operation and small device sizes reduce the critical charge stored in a SRAM cell making caches more susceptible to soft errors. To counter the problem, a plethora of protection schemes are being proposed to reduce the power and area overheads of cache protection. Accurate methods to measure and compare the effectiveness of such schemes are sorely needed. This paper introduces a unified reliability benchmarking framework with PARMA (Precise Analytical Reliability Model for Architecture). PARMA is a rigorous analytical framework to measure the failure rate in the presence of soft errors under any protection scheme. PARMA continuously and accurately accounts for the increase of failure density due to soft errors affecting program behavior during an execution. We have implemented the PARMA framework on top of a cycle-accurate simulator to benchmark Level-2 cache failure rates for a set of CPU 2000 benchmarks. Three protection schemes are compared: parity, word-level 1-bit ECC and block-level 1-bit ECC.

[1]  Tryggve Fossum,et al.  Cache scrubbing in microprocessors: myth or necessity? , 2004, 10th IEEE Pacific Rim International Symposium on Dependable Computing, 2004. Proceedings..

[2]  A.F. Witulski,et al.  Models and Algorithmic Limits for an ECC-Based Approach to Hardening Sub-100-nm SRAMs , 2007, IEEE Transactions on Nuclear Science.

[3]  Joel S. Emer,et al.  Techniques to reduce the soft error rate of a high-performance microprocessor , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[4]  Brad Calder,et al.  Using SimPoint for accurate and efficient simulation , 2003, SIGMETRICS '03.

[5]  Xiaodong Li,et al.  Architecture-Level Soft Error Analysis: Examining the Limits of Common Assumptions , 2007, 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07).

[6]  Xiaodong Li,et al.  SoftArch: an architecture-level tool for modeling and analyzing soft errors , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).

[7]  Mehdi Baradaran Tahoori,et al.  Vulnerability Analysis of L2 Cache Elements to Single Event Upsets , 2006, Proceedings of the Design Automation & Test in Europe Conference.

[8]  Janak H. Patel,et al.  Reliability of scrubbing recovery-techniques for memory systems , 1990 .

[9]  Joel Emer,et al.  A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[10]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[11]  Massimo Violante,et al.  An accurate analysis of the effects of soft errors in the instruction and data caches of a pipelined microprocessor , 2003, 2003 Design, Automation and Test in Europe Conference and Exhibition.

[12]  Robert Ecoffet,et al.  Determination of key parameters for SEU occurrence using 3-D full cell SRAM simulations , 1999 .

[13]  P. Hazucha,et al.  Impact of CMOS technology scaling on the atmospheric neutron soft error rate , 2000 .

[14]  B. Epstein Truncated Life Tests in the Exponential Case , 1954 .

[15]  Jeffrey T. Draper,et al.  Critical Charge Characterization for Soft Error Rate Modeling in 90nm SRAM , 2007, 2007 IEEE International Symposium on Circuits and Systems.

[16]  Gwan S. Choi,et al.  On-chip cache memory resilience , 1998, Proceedings Third IEEE International High-Assurance Systems Engineering Symposium (Cat. No.98EX231).

[17]  Mehdi Baradaran Tahoori,et al.  Reducing Data Cache Susceptibility to Soft Errors , 2006, IEEE Transactions on Dependable and Secure Computing.

[18]  Trevor Mudge,et al.  Razor: a low-power pipeline based on circuit-level timing speculation , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..