Accurate vulnerability estimation for cache hierarchy

Soft errors caused by energetic particle strikes in on-chip cache memories have become a critical challenge for microprocessor design. Architectural Vulnerability Factor (AVF), which is defined as the probability that a transient fault in the structure would result in a visible error in the final output of a program, has been widely employed for accurate soft error rate estimation. In this paper, we propose an improved AVF estimation method. Using the improved AVF computing method, we estimate the reliability of structures in the cache hierarchy (i.e., L1I, L1D, L2 caches). Experimental results show that L1I is the most reliable component, while L2 cache is the most vulnerable component in the cache hierarchy. L2 cache is demonstrated to be almost 23 times more susceptive than L1I. We also investigate the effects of L1D cache organization on the reliability of L1D and L2 caches. Experimental results show that a larger L1D cache is more vulnerable to soft errors, yet brings an improvement of L2 cache reliability. Besides, we find that increasing cache associativity of L1D cache is of little benefit to improve the reliability of cache hierarchy. Based on the quantitative analysis of reliability of structures in the cache hierarchy, we choose different levels of protection for caches, thus maintaining the reliability goal with minimum costs.

[1]  Michael L. Scott,et al.  Integrating adaptive on-chip storage structures for reduced dynamic power , 2002, Proceedings.International Conference on Parallel Architectures and Compilation Techniques.

[2]  Joel Emer,et al.  Computing Architectural Vulnerability Factors for Address-Based Structures , 2005, ISCA 2005.

[3]  Mehdi Baradaran Tahoori,et al.  A Field Analysis of System-level Effects of Soft Errors Occurring in Microprocessors used in Information Systems , 2008, 2008 IEEE International Test Conference.

[4]  Michael S. Floyd,et al.  Fault - tolerant design of the IBM POWER6™ microprocessor , 2007, 2007 IEEE Hot Chips 19 Symposium (HCS).

[5]  Barry W. Johnson Design & analysis of fault tolerant digital systems , 1988 .

[6]  K. Soumyanath,et al.  Scaling trends of cosmic ray induced soft errors in static latches beyond 0.18 /spl mu/ , 2001, 2001 Symposium on VLSI Circuits. Digest of Technical Papers (IEEE Cat. No.01CH37185).

[7]  Joel S. Emer,et al.  The soft error problem: an architectural perspective , 2005, 11th International Symposium on High-Performance Computer Architecture.

[8]  Wei Zhang,et al.  ICR: in-cache replication for enhancing data cache reliability , 2003, 2003 International Conference on Dependable Systems and Networks, 2003. Proceedings..

[9]  Jiri Gaisler Evaluation of a 32-bit microprocessor with built-in concurrent error-detection , 1997, Proceedings of IEEE 27th International Symposium on Fault Tolerant Computing.

[10]  Todd M. Austin,et al.  A Systematic Methodology to Compute the Architectural Vulnerability Factors for a High-Performance Microprocessor , 2003, MICRO.

[11]  Frank Vahid,et al.  A self-tuning cache architecture for embedded systems , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[12]  Changhong Dai,et al.  Impact of CMOS process scaling and SOI on the soft error rates of logic processes , 2001, 2001 Symposium on VLSI Technology. Digest of Technical Papers (IEEE Cat. No.01 CH37184).

[13]  R.C. Baumann,et al.  Radiation-induced soft errors in advanced semiconductor technologies , 2005, IEEE Transactions on Device and Materials Reliability.

[14]  Brad Calder,et al.  Automatically characterizing large scale program behavior , 2002, ASPLOS X.

[15]  N. Seifert,et al.  Robust system design with built-in soft-error resilience , 2005, Computer.

[16]  Xiaodong Li,et al.  Architecture-Level Soft Error Analysis: Examining the Limits of Common Assumptions , 2007, 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'07).

[17]  Arijit Biswas,et al.  Computing architectural vulnerability factors for address-based structures , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[18]  Rajeev Balasubramonian,et al.  Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures , 2000, MICRO 33.

[19]  Stefan Rusu,et al.  Itanium 2 processor 6M: higher frequency and larger L3 cache , 2004, IEEE Micro.

[20]  Frank Vahid,et al.  A highly configurable cache architecture for embedded systems , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..

[21]  Wei Zhang,et al.  Computing cache vulnerability to transient errors and its implication , 2005, 20th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (DFT'05).

[22]  Mehdi Baradaran Tahoori,et al.  Balancing Performance and Reliability in the Memory Hierarchy , 2005, IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005..

[23]  Lorenzo Alvisi,et al.  Modeling the effect of technology trends on the soft error rate of combinational logic , 2002, Proceedings International Conference on Dependable Systems and Networks.