Statistical Reliability Estimation of Microprocessor-Based Systems

What is the probability that the execution state of a given microprocessor running a given application is correct, in a certain working environment with a given soft-error rate? Trying to answer this question using fault injection can be very expensive and time consuming. This paper proposes the baseline for a new methodology, based on microprocessor error probability profiling, that aims at estimating fault injection results without the need of a typical fault injection setup. The proposed methodology is based on two main ideas: a one-time fault-injection analysis of the microprocessor architecture to characterize the probability of successful execution of each of its instructions in presence of a soft-error, and a static and very fast analysis of the control and data flow of the target software application to compute its probability of success. The presented work goes beyond the dependability evaluation problem; it also has the potential to become the backbone for new tools able to help engineers to choose the best hardware and software architecture to structurally maximize the probability of a correct execution of the target software.

[1]  Shubhendu S. Mukherjee,et al.  Measuring Architectural Vulnerability Factors , 2003, IEEE Micro.

[2]  Alfredo Benso,et al.  Control-flow checking via regular expressions , 2001, Proceedings 10th Asian Test Symposium.

[3]  Alfredo Benso,et al.  PROMON: a profile monitor of software applications , 2005 .

[4]  Robert Baumann Technology scaling trends and accelerated testing for soft errors in commercial silicon devices , 2003, 9th IEEE On-Line Testing Symposium, 2003. IOLTS 2003..

[5]  Todd M. Austin,et al.  A Systematic Methodology to Compute the Architectural Vulnerability Factors for a High-Performance Microprocessor , 2003, MICRO.

[6]  H. Ando,et al.  A 1.3GHz fifth generation SPARC64 microprocessor , 2003, Proceedings 2003. Design Automation Conference (IEEE Cat. No.03CH37451).

[7]  Ming Zhang,et al.  Logic soft errors: a major barrier to robust platform design , 2005, IEEE International Conference on Test, 2005..

[8]  Joel S. Emer,et al.  Techniques to reduce the soft error rate of a high-performance microprocessor , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[9]  Robert Baumann,et al.  Soft errors in advanced computer systems , 2005, IEEE Design & Test of Computers.

[10]  Alfredo Benso,et al.  SEU effect analysis in a open-source router via a distributed fault injection environment , 2001, Proceedings Design, Automation and Test in Europe. Conference and Exhibition 2001.

[11]  Peter Hazucha,et al.  Characterization of soft errors caused by single event upsets in CMOS processes , 2004, IEEE Transactions on Dependable and Secure Computing.

[12]  Ming Zhang,et al.  Logic soft errors in sub-65nm technologies design and CAD challenges , 2005, Proceedings. 42nd Design Automation Conference, 2005..

[13]  E. Normand Single event upset at ground level , 1996 .

[14]  Alfredo Benso,et al.  Single-Event Upset Analysis and Protection in High Speed Circuits , 2006, Eleventh IEEE European Test Symposium (ETS'06).

[15]  Nur A. Touba,et al.  Partial error masking to reduce soft error failure rate in logic circuits , 2003, Proceedings 18th IEEE Symposium on Defect and Fault Tolerance in VLSI Systems.

[16]  Xiaodong Li,et al.  Online Estimation of Architectural Vulnerability Factor for Soft Errors , 2008, 2008 International Symposium on Computer Architecture.

[17]  Nur A. Touba,et al.  Cost-effective approach for reducing soft error failure rate in logic circuits , 2003, International Test Conference, 2003. Proceedings. ITC 2003..

[18]  Aneesh Aggarwal,et al.  Self-checking instructions — reducing instruction redundancy for concurrent error detection , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[19]  P.E. Dodd,et al.  Physics-based simulation of single-event effects , 2005, IEEE Transactions on Device and Materials Reliability.

[20]  Coniferous softwood GENERAL TERMS , 2003 .

[21]  Y. Yagil,et al.  A systematic approach to SER estimation and solutions , 2003, 2003 IEEE International Reliability Physics Symposium Proceedings, 2003. 41st Annual..

[22]  Michael F. P. O'Boyle,et al.  Evaluating the Effects of Compiler Optimisations on AVF , 2008 .

[23]  C. Metra,et al.  A model for transient fault propagation in combinatorial logic , 2003, 9th IEEE On-Line Testing Symposium, 2003. IOLTS 2003..

[24]  Israel Koren,et al.  Techniques for transient fault sensitivity analysis and reduction in VLSI circuits , 2003, Proceedings 18th IEEE Symposium on Defect and Fault Tolerance in VLSI Systems.

[25]  Lorenzo Alvisi,et al.  Modeling the effect of technology trends on the soft error rate of combinational logic , 2002, Proceedings International Conference on Dependable Systems and Networks.

[26]  Xiaodong Li,et al.  SoftArch: an architecture-level tool for modeling and analyzing soft errors , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).

[27]  A. Benso,et al.  Software dependability techniques validated via fault injection experiments , 2001, RADECS 2001. 2001 6th European Conference on Radiation and Its Effects on Components and Systems (Cat. No.01TH8605).

[28]  Alfredo Benso,et al.  Static analysis of SEU effects on software applications , 2002, Proceedings. International Test Conference.

[29]  Giorgio Di Natale,et al.  LIFTING: A Flexible Open-Source Fault Simulator , 2008, 2008 17th Asian Test Symposium.

[30]  Alfredo Benso,et al.  Validation of a software dependability tool via fault injection experiments , 2001, Proceedings Seventh International On-Line Testing Workshop.

[31]  Massimo Violante,et al.  Accurate and efficient analysis of single event transients in VLSI circuits , 2003, 9th IEEE On-Line Testing Symposium, 2003. IOLTS 2003..

[32]  Shekhar Y. Borkar,et al.  Thousand Core ChipsA Technology Perspective , 2007, 2007 44th ACM/IEEE Design Automation Conference.

[33]  Alfredo Benso,et al.  A watchdog processor to detect data and control flow errors , 2003, 9th IEEE On-Line Testing Symposium, 2003. IOLTS 2003..

[34]  Giorgio Di Natale,et al.  On-Line Instruction-Checking in Pipelined Microprocessors , 2008, 2008 17th Asian Test Symposium.

[35]  Nihar R. Mahapatra,et al.  Analysis and design of soft-error hardened latches , 2005, ACM Great Lakes Symposium on VLSI.

[36]  Tipp Moseley,et al.  PLR: A Software Approach to Transient Fault Tolerance for Multicore Architectures , 2009, IEEE Transactions on Dependable and Secure Computing.

[37]  James R. Larus,et al.  Efficient program tracing , 1993, Computer.

[38]  Shekhar Y. Borkar,et al.  Tackling variability and reliability challenges , 2006, IEEE Des. Test Comput..

[39]  Sanjay J. Patel,et al.  Characterizing the effects of transient faults on a high-performance processor pipeline , 2004, International Conference on Dependable Systems and Networks, 2004.

[40]  Arun K. Somani,et al.  Soft error sensitivity characterization for microprocessor dependability enhancement strategy , 2002, Proceedings International Conference on Dependable Systems and Networks.