Survivability Modeling and Resource Planning for Self-Repairing Reconfigurable Device Fabrics

A resilient system design problem is formulated as the quantification of uncommitted reconfigurable resources required for a system of components to survive its lifetime within mission availability specifications. We show that this survivability metric can be calculated according to the residual functionality obtained from pools of dynamically configurable elements constituting the amorphous resource pool (ARP). The ARP is depleted based on the failure rate to replenish the functionality lost in a reconfigurable fabric due to the occurrence of permanent faults during the mission lifetime. While genetic algorithms are selected for the reparation method, any probabilistic or deterministic active repair strategy is covered without loss of generality. Parameters of this model are correlated with reliability specifications of Xilinx Virtex-4 field programmable gate array devices, which are then utilized for MCNC benchmark circuits along with a realistic space mission. Calculation of the spare fabric resources which must be budgeted for a mission, maximum mission lifetime, and repair policy parameters are realized using the proposed probabilistic survivability model for soft computing-based repair strategies.

[1]  S. Shingai,et al.  A Method of Rapid Markov Reliability Calculation , 1985, IEEE Transactions on Reliability.

[2]  C. H. Lie,et al.  Fault Tree Analysis, Methods, and Applications ߝ A Review , 1985, IEEE Transactions on Reliability.

[3]  Peter Alfke,et al.  Radiation Tolerance of High-Density FPGAs , 1998 .

[4]  D. Fleetwood,et al.  An overview of radiation effects on electronics in the space telecommunications environment , 2000 .

[5]  J. Barth,et al.  Space, atmospheric, and terrestrial radiation environments , 2003 .

[6]  J. Rosenthal,et al.  General state space Markov chains and MCMC algorithms , 2004, math/0404033.

[7]  Sarita V. Adve,et al.  The impact of technology scaling on lifetime reliability , 2004, International Conference on Dependable Systems and Networks, 2004.

[8]  Olivier Héron,et al.  On the reliability evaluation of SRAM-based FPGA designs , 2005, International Conference on Field Programmable Logic and Applications, 2005..

[9]  Carthik A. Sharma,et al.  Self-Checking Fault Detection using Discrepancy Mirrors , 2005, PDPTA.

[10]  Ward F. Thomas,et al.  Availability , Reliability , and Survivability : An Introduction and Some Contractual Implications , 2006 .

[11]  Ronald F. DeMara,et al.  Layered Approach to Intrinsic Evolvable Hardware using Direct Bitstream Manipulation of Virtex II Pro Devices , 2007, 2007 International Conference on Field Programmable Logic and Applications.

[12]  Israel Koren,et al.  Fault-Tolerant Systems , 2007 .

[13]  Narayanan Vijaykrishnan,et al.  Toward Increasing FPGA Lifetime , 2008, IEEE Transactions on Dependable and Secure Computing.

[14]  N. Senthilkumaran,et al.  Image Segmentation - A Survey of Soft Computing Approaches , 2009, 2009 International Conference on Advances in Recent Technologies in Communication and Computing.

[15]  Jooheung Lee,et al.  Dynamic Partial Reconfiguration Approach to the Design of Sustainable Edge Detectors , 2010, ERSA.

[16]  Chiara Sandionigi,et al.  Fault Classification for SRAM-Based FPGAs in the Space Environment for Fault Mitigation , 2010, IEEE Embedded Systems Letters.

[17]  Yi-Kuei Lin,et al.  Using minimal cuts to optimize network reliability for a stochastic computer network subject to assignment budget , 2011, Comput. Oper. Res..

[18]  Ramesh Karri,et al.  Toward Future Systems with Nanoscale Devices: Overcoming the Reliability Challenge , 2011, Computer.

[19]  Patrick D. T. O'Connor,et al.  Practical Reliability Engineering: O'Connor/Practical Reliability Engineering , 2011 .

[20]  Alan D. George,et al.  Reconfigurable Fault Tolerance: A Comprehensive Framework for Reliable and Adaptive FPGA-Based Space Computing , 2012, TRETS.

[21]  Linda S. Milor,et al.  Analysis and On-Chip Monitoring of Gate Oxide Breakdown in SRAM Cells , 2012, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[22]  Kazutoshi Kobayashi,et al.  NBTI-Induced Delay Degradation Analysis of FPGA Routing Structures , 2012, IPSJ Trans. Syst. LSI Des. Methodol..

[23]  F. Novak,et al.  SEU Recovery Mechanism for SRAM-Based FPGAs , 2012, IEEE Transactions on Nuclear Science.

[24]  Sachin S. Sapatnekar,et al.  Scalable Methods for Analyzing the Circuit Failure Probability Due to Gate Oxide Breakdown , 2012, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[25]  Brent Nelson,et al.  Reliability Models for SEC/DED Memory With Scrubbing in FPGA-Based Designs , 2013, IEEE Transactions on Nuclear Science.

[26]  Lei He,et al.  SEU fault evaluation and characteristics for SRAM-based FPGA architectures and synthesis algorithms , 2013, TODE.

[27]  Hamid R. Zarandi,et al.  A Fast and Accurate Fault Tree Analysis Based on Stochastic Logic Implemented on Field-Programmable Gate Arrays , 2013, IEEE Transactions on Reliability.

[28]  R. Jiang,et al.  A new bathtub curve model with a finite support , 2013, Reliab. Eng. Syst. Saf..

[29]  Sergio D'Angelo,et al.  A Preliminary Study about SEU Effects on Programmable Interconnections of SRAM-based FPGAs , 2013, J. Electron. Test..

[30]  Jooheung Lee,et al.  Self-Adapting Resource Escalation for Resilient Signal Processing Architectures , 2014, J. Signal Process. Syst..

[31]  Christopher Frost,et al.  Voltage scaling and aging effects on soft error rate in SRAM-based FPGAs , 2014, Microelectron. Reliab..

[32]  Tanya Vladimirova,et al.  Mitigation of Radiation Effects in SRAM-Based FPGAs for Space Applications , 2014, ACM Comput. Surv..

[33]  Ronald F. DeMara,et al.  Sustainability assurance modeling for SRAM-based FPGA evolutionary self-repair , 2014, 2014 IEEE International Conference on Evolvable Systems.

[34]  Christopher Frost,et al.  Aging and voltage scaling impacts under neutron-induced soft error rate in SRAM-based FPGAs , 2014, 2014 19th IEEE European Test Symposium (ETS).

[35]  Yuan Yan Tang,et al.  The Generalization Performance of Regularized Regression Algorithms Based on Markov Sampling , 2014, IEEE Transactions on Cybernetics.

[36]  Dan Zhao,et al.  Research on the System Reliability Modeling Based on Markov Process and Reliability Block Diagram , 2014, ECC.

[37]  Abdel Ejnioui,et al.  Adaptive Mitigation of Radiation-Induced Errors and TDDB in Reconfigurable Logic Fabrics , 2015, 2015 IEEE 24th North Atlantic Test Workshop.

[38]  C. L. Philip Chen,et al.  Reliability Modeling and Life Estimation Using an Expectation Maximization Based Wiener Degradation Model for Momentum Wheels , 2015, IEEE Transactions on Cybernetics.

[39]  Yi-Kuei Lin,et al.  System Performance and Reliability Modeling of a Stochastic-Flow Production Network: A Confidence-Based Approach , 2015, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[40]  C. Schlunder Circuit Reliability: Hot-Carrier Stress of MOS Transistors in Different Fields of Application , 2015 .

[41]  Robert Y. Liang,et al.  Importance sampling based algorithm for efficient reliability analysis of axially loaded piles , 2015 .

[42]  Zongben Xu,et al.  Learning With $\ell _{1}$ -Regularizer Based on Markov Resampling , 2016, IEEE Transactions on Cybernetics.

[43]  Ronald F. DeMara,et al.  Fast Online Diagnosis and Recovery of Reconfigurable Logic Fabrics Using Design Disjunction , 2016, IEEE Transactions on Computers.