Duplicate with Choose: Using Statistics for Fault Mitigation

Duplicate with Choose: Using Statistics for Fault Mitigation Jon-Paul Anderson Department of Electrical and Computer Engineering, BYU Doctor of Philosophy This dissertation presents a novel technique called duplicate with choose (DWCh) which is a modification of the fault detection technique duplicate with compare (DWC). DWCh adds a smart decider block to DWC that monitors the duplicated circuits and decides which circuit is fault free when a fault occurs. If chosen correctly, DWCh is able to mask faults at a lower cost than conventional techniques like TMR. This dissertation derives reliability expressions for DWCh showing that under ideal conditions its reliability exceeds the most commonly used fault masking technique for spacecraft, triple modular redundancy. For non-ideal conditions, DWCh provides a lower cost alternative than TMR but with lower reliability as well. Three types of DWCh smart deciders were developed for use with digital communications receivers. The first type used histograms as the statistical basis for the decider. The second type made use of moments for decision. The third type, although not generally applicable to other systems, used a signal common to communications receivers with excellent results. The communications receivers were subjected to hardware fault injection to gather datastreams affected by real world faults. The captured datastreams were used with Simulink models of the different deciders to quantify their performance and discover how a practical implementation of DWCh differs from the theoretical model. The increase in mean time to failure for DWCh when compared to simplex ranged from 20x to 130x depending on the specific smart decider tested.

[1]  Daniel P. Siewiorek,et al.  Reliable Computer Systems: Design and Evaluation, Third Edition , 1998 .

[2]  Naresh R. Shanbhag,et al.  Soft NMR: Exploiting statistics for energy-efficiency , 2009, 2009 International Symposium on System-on-Chip.

[3]  Mile K. Stojcev,et al.  Implementation of self-checking two-level combinational logic on FPGA and CPLD circuits , 2004, Microelectron. Reliab..

[4]  Michael Rice,et al.  Reliable Communications Using FPGAs in High-Radiation Environments - Part I: Characterization , 2010, 2010 IEEE International Conference on Communications.

[5]  Fredric J. Harris,et al.  Multirate Signal Processing for Communication Systems , 2004 .

[6]  Alexander M. Wyglinski,et al.  Modular FPGA-based software defined radio for CubeSats , 2012, 2012 IEEE International Conference on Communications (ICC).

[7]  Apostolos Georgiadis,et al.  Gain, phase imbalance, and phase noise effects on error vector magnitude , 2004, IEEE Transactions on Vehicular Technology.

[8]  R. Hassun,et al.  Effective evaluation of link quality using error vector magnitude techniques , 1997, Proceedings of 1997 Wireless Communications Conference.

[9]  Edward Petersen,et al.  Single Event Effects in Aerospace , 2011 .

[10]  Jean-Francois Castet,et al.  Reliability, multi-state failures and survivability of spacecraft and space-based networks , 2012 .

[11]  Zilong Wang,et al.  DAO: Dual module redundancy with AND/OR logic voter for FPGA hardening , 2015, 2015 First International Conference on Reliability Systems Engineering (ICRSE).

[12]  Aidong Xu,et al.  Design of a fault-tolerant voter for safety related analog inputs , 2010, 2010 3rd International Conference on Advanced Computer Theory and Engineering(ICACTE).

[13]  Peter Y. K. Cheung,et al.  Fault tolerant methods for reliability in FPGAs , 2008, 2008 International Conference on Field Programmable Logic and Applications.

[14]  Robert Parker,et al.  Applications of adaptive computing systems for signal processing challenges , 2003, ASP-DAC '03.

[15]  Michael J. Wirthlin,et al.  The Cibola Flight Experiment , 2015, TRETS.

[16]  Christopher P. Bridges,et al.  Software Defined Radio (SDR) architecture to support multi-satellite communications , 2015, 2015 IEEE Aerospace Conference.

[17]  Michael J. Wirthlin,et al.  On-Orbit Flight Results from the Reconfigurable Cibola Flight Experiment Satellite (CFESat) , 2009, 2009 17th IEEE Symposium on Field Programmable Custom Computing Machines.

[18]  James Cutler,et al.  A CubeSat design to validate the Virtex-5 FPGA for spaceborne image processing , 2010, 2010 IEEE Aerospace Conference.

[19]  Heather M. Quinn,et al.  Terrestrial-based radiation upsets: a cautionary tale , 2005, 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'05).

[20]  Kam L. Wong What is wrong with the existing reliability prediction methods , 1990 .

[21]  W. C. Carter,et al.  Reliability modeling techniques for self-repairing computer systems , 1969, ACM '69.

[22]  Kevin Parkinson,et al.  A GPS Receiver Designed for Cubesat Operations , 2011 .

[23]  R.A. Shafik,et al.  On the error vector magnitude as a performance metric and comparative analysis , 2006, 2006 International Conference on Emerging Technologies.

[24]  Myron Hecht,et al.  Reliability Prediction for Spacecraft , 1985 .

[25]  Kiamal Z. Pekmestzi,et al.  Low latency radiation tolerant self-repair reconfigurable SRAM architecture , 2016, Microelectron. Reliab..

[26]  D. B. Armstrong A general method of applying error correction to synchronous digital systems , 1961 .

[27]  Paul Graham,et al.  Accelerator validation of an FPGA SEU simulator , 2003 .

[28]  Anwar S. Dawood,et al.  FPGA based real-time adaptive filtering for space applications , 2002, 2002 IEEE International Conference on Field-Programmable Technology, 2002. (FPT). Proceedings..

[29]  Rémi Gaillard,et al.  Single Event Effects: Mechanisms and Classification , 2011 .

[30]  Kris Gaj,et al.  Experimental Testing of the Gigabit IPSec-Compliant Implementations of Rijndael and Triple DES Using SLAAC-1V FPGA Accelerator Board , 2001, ISC.

[31]  Brock J. LaMeres,et al.  Increasing Radiation Tolerance of Field-Programmable-Gate-Array-Based Computers Through Redundancy and Environmental Awareness , 2014, J. Aerosp. Inf. Syst..

[32]  William H. Pierce Failure-Tolerant Computer Design , 2014 .

[33]  Pedro Reviriego,et al.  Diverse Double Modular Redundancy: A New Direction for Soft-Error Detection and Correction , 2013, IEEE Design & Test.

[34]  John Williams,et al.  Reconfigurable FPGAS for real time image processing in space , 2002, 2002 14th International Conference on Digital Signal Processing Proceedings. DSP 2002 (Cat. No.02TH8628).

[35]  Michael Rice,et al.  Digital Communications: A Discrete-Time Approach , 2008 .

[36]  Naresh R. Shanbhag,et al.  Soft N-Modular Redundancy , 2012, IEEE Transactions on Computers.

[37]  J. von Neumann,et al.  Probabilistic Logic and the Synthesis of Reliable Organisms from Unreliable Components , 1956 .

[38]  D.L. McMurtrey,et al.  A Comparison of TMR With Alternative Fault-Tolerant Design Techniques for FPGAs , 2007, IEEE Transactions on Nuclear Science.

[39]  Luigi Carro,et al.  Designing fault-tolerant techniques for SRAM-based FPGAs , 2004, IEEE Design & Test of Computers.

[40]  Milena Krasich Reliability Prediction Using Flight Experience - Weibull Adjusted Probability of Survival, WAPS , 1995 .

[41]  Michael J. Wirthlin,et al.  The reliability of FPGA circuit designs in the presence of radiation induced configuration upsets , 2003, 11th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, 2003. FCCM 2003..

[42]  T. Moon Error Correction Coding: Mathematical Methods and Algorithms , 2005 .

[43]  J. Johnson,et al.  Using Duplication with Compare for On-line Error Detection in FPGA-based Designs , 2008, 2008 IEEE Aerospace Conference.

[44]  Jan Schmidt,et al.  Novel C-Element Based Error Detection and Correction Method Combining Time and Area Redundancy , 2015, 2015 Euromicro Conference on Digital System Design.

[45]  John F. Wakerly,et al.  Error detecting codes, self-checking circuits and applications , 1978 .

[46]  C. Carmichael,et al.  Single Event Upsets in Xilinx Virtex-4 FPGA Devices , 2006, 2006 IEEE Radiation Effects Data Workshop.

[47]  Edward J. McCluskey,et al.  Which concurrent error detection scheme to choose ? , 2000, Proceedings International Test Conference 2000 (IEEE Cat. No.00CH37159).

[48]  Gary Swift,et al.  VIRTEX-4 VQ static SEU Characterization Summary , 2008 .