Formal analysis of SEU mitigation for early dependability and performability analysis of FPGA-based space applications

Abstract SRAM-based FPGAs are increasingly popular in the aerospace industry due to their field programmability and low cost. However, they suffer from cosmic radiation induced Single Event Upsets (SEUs). In safety-critical applications, the dependability of the design is a prime concern since failures may have catastrophic consequences. An early analysis of the relationship between dependability metrics, performability-area trade-off, and different mitigation techniques for such applications can reduce the design effort while increasing the design confidence. This paper introduces a novel methodology based on probabilistic model checking, for the analysis of the reliability, availability, safety and performance-area tradeoffs of safety-critical systems for early design decisions. Starting from the high-level description of a system, a Markov reward model is constructed from the Control Data Flow Graph (CDFG) and a component characterization library targeting FPGAs. The proposed model and exhaustive analysis capture all the failure states (based on the fault detection coverage) and repairs possible in the system. We present quantitative results based on an FIR filter circuit to illustrate the applicability of the proposed approach and to demonstrate that a wide range of useful dependability and performability properties can be analyzed using the proposed methodology. The modeling results show the relationship between different mitigation techniques and fault detection coverage, exposing their direct impact on the design for early decisions.

[1]  William J. Stewart,et al.  Introduction to the numerical solution of Markov Chains , 1994 .

[2]  Yvon Savaria,et al.  Probabilistic model checking based DAL analysis to optimize a combined TMR-blind-scrubbing mitigation technique for FPGA-based aerospace applications , 2014, 2014 Twelfth ACM/IEEE Conference on Formal Methods and Models for Codesign (MEMOCODE).

[3]  Ramesh Karri,et al.  High-Level Synthesis of Fault-Secure Microarchitectures , 1993, 30th ACM/IEEE Design Automation Conference.

[4]  Peter A. Beerel,et al.  A Designer's Guide to Asynchronous VLSI , 2010 .

[5]  M. Shea,et al.  CREME96: A Revision of the Cosmic Ray Effects on Micro-Electronics Code , 1997 .

[6]  Barry W. Johnson,et al.  Application of a fault injection based dependability assessment process to a commercial safety critical nuclear reactor protection system , 2010, 2010 IEEE/IFIP International Conference on Dependable Systems & Networks (DSN).

[7]  Anwar S. Dawood,et al.  FPGA based real-time adaptive filtering for space applications , 2002, 2002 IEEE International Conference on Field-Programmable Technology, 2002. (FPT). Proceedings..

[8]  Ragnar Huslende,et al.  A combined evaluation of performance and reliability for degradable systems , 1981, SIGMETRICS '81.

[9]  Saraju P. Mohanty,et al.  Low-Power High-Level Synthesis for Nanoscale CMOS Circuits , 2008 .

[10]  Yvon Savaria,et al.  Applying formal verification to early assessment of FPGA-based aerospace applications: Methodology and experience , 2016, 2016 Annual IEEE Systems Conference (SysCon).

[11]  Khaza Anuarul Hoque,et al.  Early Dependability Analysis of FPGA-Based Space Applications Using Formal Verification , 2016 .

[12]  Elena Dubrova,et al.  Fault-Tolerant Design , 2013 .

[13]  Yvon Savaria,et al.  Towards an accurate reliability, availability and maintainability analysis approach for satellite systems based on probabilistic model checking , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[14]  Peter Y. K. Cheung,et al.  Fault tolerant methods for reliability in FPGAs , 2008, 2008 International Conference on Field Programmable Logic and Applications.

[15]  Scott Hauck,et al.  Hyperspectral image compression on reconfigurable platforms , 2002, Proceedings. 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[16]  Brent Nelson,et al.  Reliability Models for SEC/DED Memory With Scrubbing in FPGA-Based Designs , 2013, IEEE Transactions on Nuclear Science.

[17]  Yvon Savaria,et al.  Early Analysis of Soft Error Effects for Aerospace Applications Using Probabilistic Model Checking , 2013, FTSCS.

[18]  Geert Deconinck,et al.  Hybrid reliability model for nuclear reactor safety system , 2012, Reliab. Eng. Syst. Saf..

[19]  Kishor S. Trivedi,et al.  Coverage Modeling for Dependability Analysis of Fault-Tolerant Systems , 1989, IEEE Trans. Computers.

[20]  Miodrag Potkonjak,et al.  Heterogeneous BISR-approach using system level synthesis flexibility , 1998, Proceedings of 1998 Asia and South Pacific Design Automation Conference.

[21]  Hamid R. Zarandi,et al.  Probabilistic analysis of dynamic and temporal fault trees using accurate stochastic logic gates , 2015, Microelectron. Reliab..

[22]  M. D. Beaudry,et al.  Performance-Related Reliability Measures for Computing Systems , 1978, IEEE Transactions on Computers.

[23]  Austin Lesea,et al.  Continuing Experiments of Atmospheric Neutron Effects on Deep Submicron Integrated Circuits , 2008 .

[24]  Yubo Li Reliability Techniques for Data Communication and Storage in FPGA-Based Circuits , 2012 .

[25]  Raoul Velazco,et al.  SEU Fault-Injection in VHDL-Based Processors: A Case Study , 2012, 2012 13th Latin American Test Workshop (LATW).

[26]  Liudong Xing,et al.  Reliability of k-out-of-n systems with phased-mission requirements and imperfect fault coverage , 2012, Reliab. Eng. Syst. Saf..

[27]  Mahmut T. Kandemir,et al.  Reliability-centric high-level synthesis , 2005, Design, Automation and Test in Europe.

[28]  Teresa M. Braun Satellite Communications Payload and System , 2012 .

[29]  M.D. Beaudry,et al.  PERFORMANCE RELATED RELIABILITY MEASURES FOR COMPUTING SYSTEMS , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing, 1995, ' Highlights from Twenty-Five Years'..

[30]  Alan D. George,et al.  Scrubbing optimization via availability prediction (SOAP) for reconfigurable space computing , 2012, 2012 IEEE Conference on High Performance Extreme Computing.

[31]  A. C. Riddle,et al.  Voyager 1 Planetary Radio Astronomy Observations Near Jupiter , 1979, Science.

[32]  Barry R. Borgerson,et al.  A Reliability Model for Gracefully Degrading and Standby-Sparing Systems , 1975, IEEE Transactions on Computers.

[33]  Kishor S. Trivedi,et al.  Performability Analysis: Measures, an Algorithm, and a Case Study , 1988, IEEE Trans. Computers.

[34]  Eric Senn,et al.  ∂ GAUT: A High-Level Synthesis Tool for DSP applications , 2008 .

[35]  Marta Kwiatkowska,et al.  Advances and challenges of probabilistic model checking , 2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[36]  Gabriel L. Nazar,et al.  Scrubbing unit repositioning for fast error repair in FPGAs , 2013, 2013 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES).

[37]  Zdenek Kotásek,et al.  Dependability Analysis of Fault Tolerant Systems Based on Partial Dynamic Reconfiguration Implemented into FPGA , 2012, 2012 15th Euromicro Conference on Digital System Design.

[38]  Barry W. Johnson,et al.  Dependability metrics to assess safety-critical systems , 2005, IEEE Transactions on Reliability.

[39]  M. Caffrey,et al.  Static Proton and Heavy Ion Testing of the Xilinx Virtex-5 Device , 2007, 2007 IEEE Radiation Effects Data Workshop.

[40]  J. Lach,et al.  IC modeling for yield-aware design with variable defect rates , 2005, Annual Reliability and Maintainability Symposium, 2005. Proceedings..

[41]  A. David,et al.  The least variable phase type distribution is Erlang , 1987 .

[42]  Yvon Savaria,et al.  A Library-Based Early Soft Error Sensitivity Analysis Technique for SRAM-Based FPGA Design , 2013, J. Electron. Test..

[43]  Kishor S. Trivedi,et al.  Performance and Reliability Analysis of Computer Systems: An Example-Based Approach Using the SHARPE Software Package , 2012 .

[44]  Joshua D. Snodgrass Low-Power Fault Tolerance for Spacecraft FPGA-Based Numerical Computing , 2006 .

[45]  Holger Hermanns,et al.  Towards Performance Prediction of Compositional Models in Industrial GALS Designs , 2009, CAV.

[46]  David A. Rennels,et al.  Fault-Tolerant Computing—Concepts and Examples , 1984, IEEE Transactions on Computers.

[47]  Marta Z. Kwiatkowska,et al.  PRISM 4.0: Verification of Probabilistic Real-Time Systems , 2011, CAV.

[48]  Fred Kröger,et al.  Temporal Logic of Programs , 1987, EATCS Monographs on Theoretical Computer Science.

[49]  John Lach,et al.  A Markov reward model for reliable synchronous dataflow system design , 2004, International Conference on Dependable Systems and Networks, 2004.

[50]  António Pacheco,et al.  Model checking expected time and expected reward formulae with random time bounds , 2006, Comput. Math. Appl..

[51]  Mihalis Psarakis,et al.  A low-cost SEU fault emulation platform for SRAM-based FPGAs , 2006, 12th IEEE International On-Line Testing Symposium (IOLTS'06).

[52]  G Allen,et al.  Assessing and mitigating radiation effects in Xilinx SRAM FPGAs , 2008, 2008 European Conference on Radiation and Its Effects on Components and Systems.

[53]  Jean Arlat,et al.  Coverage Estimation Methods for Stratified Fault Injection , 1999, IEEE Trans. Computers.

[54]  Marta Z. Kwiatkowska,et al.  Stochastic Model Checking , 2007, SFM.

[55]  A. Lesea,et al.  Effectiveness of Internal Versus External SEU Scrubbing Mitigation Strategies in a Xilinx FPGA: Design, Test, and Analysis , 2008, IEEE Transactions on Nuclear Science.

[56]  Daniel Llamocca,et al.  Dynamically reconfigurable management of energy, performance, and accuracy applied to digital signal, image, and video processing applications , 2012 .

[57]  Sara Blanc,et al.  Improving the multiple errors detection coverage in distributed embedded systems , 2003, 22nd International Symposium on Reliable Distributed Systems, 2003. Proceedings..

[58]  Pierre G. Paulin,et al.  Force-directed scheduling for the behavioral synthesis of ASICs , 1989, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[59]  Anton F. P. van Putten Electronic Measurement Systems: Theory and Practice , 1988 .