A two-level approximate model driven framework for characterizing Multi-Cell Upsets impacts on processors

Abstract Soft error analysis is very significant for a good tradeoff between processor design cost (e.g. area and power) and reliability. In this paper, we propose an approximate model driven framework for efficient soft error analysis in processors. The proposed framework includes: 1) an approximate Probabilistic Graphical Model (PGM) for the Single Bit Upset (SBU) estimation, uses average-and-max policy to handle the mapped PGM structure, node parameter and inference fast; 2) an approximate boundary model for the more complex Multi-Cell Upsets (MCU) case, adopts relax-and-strict way to reuse the approximate PGM model and characterize MCU patterns completely. The comprehensive results confirm that, compared with the state-of-the-art, the proposed two-level methodology based on approximate models achieves fast estimation up to more 15.37× speedup while only 8.14% accuracy loss on average. Furthermore, the complex MCU impacts are also estimated by the proposed method at the same order of magnitude as the runtime of the simple SBU case.

[1]  Liang Chen,et al.  CEP: Correlated Error Propagation for Hierarchical Soft Error Analysis , 2013, J. Electron. Test..

[2]  Sanjay J. Patel,et al.  Examining ACE analysis reliability estimates using fault-injection , 2007, ISCA '07.

[3]  Chun-Hsian Huang,et al.  Learning-based adaptation to applications and environments in a reconfigurable Network-on-Chip , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[4]  Jie Han,et al.  Approximate computing: An emerging paradigm for energy-efficient design , 2013, 2013 18th IEEE European Test Symposium (ETS).

[5]  John Lach,et al.  Transient fault models and AVF estimation revisited , 2010, 2010 IEEE/IFIP International Conference on Dependable Systems & Networks (DSN).

[6]  Michel Dubois,et al.  Soft error benchmarking of L2 caches with PARMA , 2011, SIGMETRICS 2011.

[7]  Xiaodong Li,et al.  SoftArch: an architecture-level tool for modeling and analyzing soft errors , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).

[8]  Joel Emer,et al.  A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[9]  Aviral Shrivastava,et al.  Static analysis to mitigate soft errors in register files , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[10]  Nevin L. Zhang,et al.  A simple approach to Bayesian network computations , 1994 .

[11]  Yuzhuo Fu,et al.  Exploiting component dependency for accurate and efficient soft error analysis via Probabilistic Graphical Models , 2015, Microelectron. Reliab..

[12]  Joel Emer,et al.  Computing Architectural Vulnerability Factors for Address-Based Structures , 2005, ISCA 2005.

[13]  Diana Marculescu,et al.  MARS-C: modeling and reduction of soft errors in combinational circuits , 2006, 2006 43rd ACM/IEEE Design Automation Conference.

[14]  Bin Li,et al.  Efficient Microarchitectural Vulnerabilities Prediction Using Boosted Regression Trees and Patient Rule Inductions , 2010, IEEE Transactions on Computers.

[15]  Shuguang Feng,et al.  Cost-efficient soft error protection for embedded microprocessors , 2006, CASES '06.

[16]  S. Wen,et al.  Thermal neutron soft error rate for SRAMS in the 90NM–45NM technology range , 2010, 2010 IEEE International Reliability Physics Symposium.

[17]  Stijn Eyerman,et al.  A first-order mechanistic model for architectural vulnerability factor , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[18]  R.C. Baumann,et al.  Radiation-induced soft errors in advanced semiconductor technologies , 2005, IEEE Transactions on Device and Materials Reliability.

[19]  Dan Alexandrescu A comprehensive soft error analysis methodology for SoCs/ASICs memory instances , 2011, 2011 IEEE 17th International On-Line Testing Symposium.

[20]  Brad Calder,et al.  Automatically characterizing large scale program behavior , 2002, ASPLOS X.

[21]  Michel Dubois,et al.  MACAU: A Markov model for reliability evaluations of caches under Single-bit and Multi-bit Upsets , 2012, IEEE International Symposium on High-Performance Comp Architecture.

[22]  Johnson,et al.  [IEEE Networks (DSN) - Chicago, IL, USA (2010.06.28-2010.07.1)] 2010 IEEE/IFIP International Conference on Dependable Systems & Networks (DSN) - Transient fault models and AVF estimation revisited , 2010 .

[23]  Michail Maniatakos,et al.  AVF Analysis Acceleration via Hierarchical Fault Pruning , 2011, 2011 Sixteenth IEEE European Test Symposium.

[24]  Yuzhuo Fu,et al.  A Heuristically Mechanical Model for Accurate and Fast Soft Error Analysis , 2014, 2014 IEEE 23rd Asian Test Symposium.

[25]  Amirali Baniasadi,et al.  System-Level Vulnerability Estimation for Data Caches , 2010, 2010 IEEE 16th Pacific Rim International Symposium on Dependable Computing.