Mitigating multi-bit soft errors in L1 caches using last-store prediction

Recent studies suggest that the rate of spatial multi-bit soft errors will increase with future technology scaling. Unfortunately, multi-bit errors cannot be effectively mitigated with conventional techniques in L1 data caches (e.g., bit interleaving or stronger coding) due to high power and/or latency overheads. We propose the laststore predictor, a lightweight prediction mechanism that accurately determines when a cache block is written for the last time and writes the data back to the L2 cache where increased access latency permits more effective multi-bit error protection. Using a combination of commercial workloads and SPEC CPU2000 benchmarks, we show that, on average, write-back L1 data caches are 42% vulnerable to multi-bit soft errors. Where SECDED ECC fails to mitigate multi-bit errors, our mechanism reduces the multi-bit soft-error vulnerability to 12% on average.

[1]  Babak Falsafi,et al.  Selective, accurate, and timely self-invalidation using last-touch prediction , 2000, ISCA '00.

[2]  Mahmut T. Kandemir,et al.  Soft error and energy consumption interactions: a data cache perspective , 2004, Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758).

[3]  Tryggve Fossum,et al.  Cache scrubbing in microprocessors: myth or necessity? , 2004, 10th IEEE Pacific Rim International Symposium on Dependable Computing, 2004. Proceedings..

[4]  Wei Zhang,et al.  Enhancing data cache reliability by the addition of a small fully-associative replication cache , 2004, ICS '04.

[5]  Kunle Olukotun,et al.  Niagara: a 32-way multithreaded Sparc processor , 2005, IEEE Micro.

[6]  Dhiraj K. Pradhan,et al.  Modeling Live and Dead Lines in Cache Memory Systems , 1993, IEEE Trans. Computers.

[7]  J. Maiz,et al.  Characterization of multi-bit soft error events in advanced SRAMs , 2003, IEEE International Electron Devices Meeting 2003.

[8]  K. Osada,et al.  SRAM immunity to cosmic-ray-induced multierrors based on analysis of an induced parasitic bipolar effect , 2004, IEEE Journal of Solid-State Circuits.

[9]  Daniel J. Sorin,et al.  Choosing an Error Protection Scheme for a Microprocessor's L1 Data Cache , 2006, 2006 International Conference on Computer Design.

[10]  Arijit Biswas,et al.  Computing architectural vulnerability factors for address-based structures , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[11]  Mehdi Baradaran Tahoori,et al.  Balancing Performance and Reliability in the Memory Hierarchy , 2005, IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005..

[12]  Arun K. Somani,et al.  Area efficient architectures for information integrity in cache memories , 1999, ISCA.

[13]  Babak Falsafi,et al.  Dead-block prediction & dead-block correlating prefetchers , 2001, ISCA 2001.

[14]  Todd M. Austin,et al.  A Systematic Methodology to Compute the Architectural Vulnerability Factors for a High-Performance Microprocessor , 2003, MICRO.