Low-Power Fault Tolerance for Spacecraft FPGA-Based Numerical Computing

Abstract : Fault tolerance is explored for spacecraft computers employing Field-Programmable Gate Arrays (FPGAs). Techniques are investigated for tolerating Single Event Upsets (SEUs) caused by radiation in the space environment. A new architectural approach is proposed for achieving SEU tolerance that minimizes power and size overhead costs by reducing the precision with which error checking is done. This Reduced Precision Redundancy (RPR) approach is compared to the traditional Triple Modular Redundancy (TMR) method. A methodology is presented for quantifying the costs and benefits of various performance factors, and thereby determining optimal design solutions. This methodology considers reliability as a performance factor that can be traded-off against factors such as power, size and speed. An SEU simulation system is developed for studying the effect of SEUs on actual FPGA circuits. Live proton radiation testing and computer-controlled fault injection simulations demonstrate the effectiveness of RPR and TMR. Computer simulations of power usage demonstrate the savings achieved with RPR. RPR is as reliable as TMR while requiring 1/3 to 1/2 as much power. The effect of imprecise computations that may be produced by an RPR system is studied. An image processing application illustrates the type of problems for which RPR can be applied effectively.

[1]  S. Ramachandran,et al.  EPLD-based architecture of real time 2D-discrete cosine transform and quantization for image compression , 1999, ISCAS'99. Proceedings of the 1999 IEEE International Symposium on Circuits and Systems VLSI (Cat. No.99CH36349).

[2]  Jack E. Volder The CORDIC Trigonometric Computing Technique , 1959, IRE Trans. Electron. Comput..

[3]  Yi Yang,et al.  An FPGA implementation of an on-line radix-4 CORDIC 2-D IDCT core , 2002, 2002 IEEE International Symposium on Circuits and Systems. Proceedings (Cat. No.02CH37353).

[4]  P.L. Murray,et al.  Single Event Effect Mitigation in ReConfigurable Computers for Space Applications , 2005, 2005 IEEE Aerospace Conference.

[5]  Edward J. McCluskey,et al.  Which concurrent error detection scheme to choose ? , 2000, Proceedings International Test Conference 2000 (IEEE Cat. No.00CH37159).

[6]  J.L. Gersting,et al.  A comparison of voting algorithms for n-version programming , 1991, Proceedings of the Twenty-Fourth Annual Hawaii International Conference on System Sciences.

[7]  Andrew G. Dempster,et al.  Transition analysis on FPGA for multiplier-block based FIR filter structures , 2000, ICECS 2000. 7th IEEE International Conference on Electronics, Circuits and Systems (Cat. No.00EX445).

[8]  A. Campbell,et al.  Single event upset rates in space , 1992 .

[9]  M. Wirthlin,et al.  SEU-induced persistent error propagation in FPGAs , 2005, IEEE Transactions on Nuclear Science.

[10]  Santanu Chattopadhyay,et al.  Low power technology mapping for LUT based FPGA - a genetic algorithm approach , 2003, 16th International Conference on VLSI Design, 2003. Proceedings..

[11]  Rolf Drechsler,et al.  Power consumption in XOR-based circuits , 1999, Proceedings of the ASP-DAC '99 Asia and South Pacific Design Automation Conference 1999 (Cat. No.99EX198).

[12]  Nur A. Touba,et al.  Lowering power consumption in concurrent checkers via input ordering , 2004, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[13]  P. Yip,et al.  Discrete Cosine Transform: Algorithms, Advantages, Applications , 1990 .

[14]  Anurag Tiwari,et al.  Enhanced reliability of finite-state machines in FPGA through efficient fault detection and correction , 2005, IEEE Transactions on Reliability.

[15]  Hojun Kim,et al.  Minimizing switching activity in input word by offset and its low power applications for FIR filters , 2003, Proceedings of the 2003 International Symposium on Circuits and Systems, 2003. ISCAS '03..

[16]  John F. Wakerly,et al.  Digital design - principles and practices , 1990, Prentice Hall Series in computer engineering.

[17]  Edward J. McCluskey,et al.  A reliable LZ data compressor on reconfigurable coprocessors , 2000, Proceedings 2000 IEEE Symposium on Field-Programmable Custom Computing Machines (Cat. No.PR00871).

[18]  Jason Helge Anderson,et al.  Active leakage power optimization for FPGAs , 2006, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[19]  Edward J. McCluskey,et al.  Fault-tolerant computing for radiation environments , 2001 .

[20]  Heather M. Quinn,et al.  Terrestrial-based radiation upsets: a cautionary tale , 2005, 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'05).

[21]  Jinsang Kim,et al.  Low-power multiplierless DCT architecture using image correlation , 2004, IEEE Trans. Consumer Electron..

[22]  Chua-Chin Wang,et al.  Low power technology mapping by hiding high-transition paths in invisible edges for LUT-based FPGAs , 1997, Proceedings of 1997 IEEE International Symposium on Circuits and Systems. Circuits and Systems in the Information Age ISCAS '97.

[23]  W. W. Peterson On Checking an Adder , 1958, IBM J. Res. Dev..

[24]  M. Caffrey,et al.  Correcting single-event upsets through virtex partial configuration , 2000 .

[25]  Yu Hen Hu,et al.  The quantization effects of the CORDIC algorithm , 1992, IEEE Trans. Signal Process..

[26]  Kang G. Shin,et al.  A Fault-Tolerant Scheduling Algorithm for Real-Time Periodic Tasks with Possible Software Faults , 2003, IEEE Trans. Computers.

[27]  Edward J. McCluskey,et al.  Fault tolerance in adaptive real-time computing systems , 2001 .

[28]  M. Wirthlin,et al.  Improving FPGA Design Robustness with Partial TMR , 2006, 2006 IEEE International Reliability Physics Symposium Proceedings.

[29]  Massimo Violante,et al.  Simulation-based analysis of SEU effects in SRAM-based FPGAs , 2004, IEEE Transactions on Nuclear Science.

[30]  C. Carmichael,et al.  A fault injection analysis of Virtex FPGA TMR design methodology , 2001, RADECS 2001. 2001 6th European Conference on Radiation and Its Effects on Components and Systems (Cat. No.01TH8605).

[31]  Thammavarapu R. N. Rao,et al.  Error coding for arithmetic processors , 1974 .

[32]  S. Katkoori,et al.  Selective triple Modular redundancy (STMR) based single-event upset (SEU) tolerant synthesis for FPGAs , 2004, IEEE Transactions on Nuclear Science.

[33]  P. Graham,et al.  Radiation-induced multi-bit upsets in SRAM-based FPGAs , 2005, IEEE Transactions on Nuclear Science.

[34]  C. Carmichael,et al.  Proton Testing of SEU Mitigation Methods for the Virtex FPGA , 2001 .

[35]  Heather Quinn,et al.  Radiation-Induced Multi-Bit Upsets in Xilinx SRAM-Based FPGAs , .

[36]  C. Carmichael,et al.  Dynamic testing of Xilinx Virtex-II field programmable gate array (FPGA) input/output blocks (IOBs) , 2004, IEEE Transactions on Nuclear Science.

[37]  N. Ahmed,et al.  Discrete Cosine Transform , 1996 .

[38]  Sangjin Hong,et al.  Variable-rate pipelined multiplier design for reconfigurable DSP applications , 2002, The 2002 45th Midwest Symposium on Circuits and Systems, 2002. MWSCAS-2002..

[39]  Graham M. Megson,et al.  Power performance with gated clocks of a pipelined Cordic core , 2003, ASIC, 2003. Proceedings. 5th International Conference on.

[40]  Anwar S. Dawood,et al.  Error detection for adaptive computing architectures in spacecraft applications , 2001, Proceedings 6th Australasian Computer Systems Architecture Conference. ACSAC 2001.

[41]  C. Carmichael,et al.  SEU mitigation testing of Xilinx Virtex II FPGAs , 2003, 2003 IEEE Radiation Effects Data Workshop.

[42]  J. B. Blake,et al.  The SAMPEX data processing unit , 1993, IEEE Trans. Geosci. Remote. Sens..

[43]  Russell Tessier,et al.  Trading off transient fault tolerance and power consumption in deep submicron (DSM) VLSI circuits , 2004, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[44]  M. Omair Ahmad,et al.  FPGA design and implementation of a low-power systolic array-based adaptive Viterbi decoder , 2005, IEEE Transactions on Circuits and Systems I: Regular Papers.

[45]  Gregory K. Wallace,et al.  The JPEG still picture compression standard , 1992 .

[46]  Krithi Ramamritham,et al.  Determining Redundancy Levels for Fault Tolerant Real-Time Systems , 1995, IEEE Trans. Computers.

[47]  Ray Andraka,et al.  A survey of CORDIC algorithms for FPGA based computers , 1998, FPGA '98.

[48]  Mary Jane Irwin,et al.  Area-time-power tradeoffs in parallel adders , 1996 .

[49]  Riccardo Bettati,et al.  Imprecise computations , 1994, Proc. IEEE.

[50]  Earl E. Swartzlander,et al.  A scaled DCT architecture with the CORDIC algorithm , 2002, IEEE Trans. Signal Process..

[51]  J. F. Shea COMPUTER DESIGN PROBLEMS FOR THE SPACE ENVIRONMENT , 1962 .

[52]  Michael J. Wirthlin,et al.  The reliability of FPGA circuit designs in the presence of radiation induced configuration upsets , 2003, 11th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, 2003. FCCM 2003..

[53]  James C. Coudeyras Radiation Testing of the Configurable Fault Tolerant Processor (CFTP) for Space-Based Applications , 2005 .

[54]  Stuart Bennett,et al.  A taxonomy for software voting algorithms used in safety-critical systems , 2004, IEEE Transactions on Reliability.

[55]  Dave E. Eckhardt,et al.  A theoretical investigation of generalized voters for redundant systems , 1989, [1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[56]  Wayne Burleson,et al.  CORDIC vector interpolator for power-aware 3D computer graphics , 2002, IEEE Workshop on Signal Processing Systems.

[57]  R. Harboe-Sorensen,et al.  Heavy ion characterization of SEU mitigation methods for the Virtex FPGA , 2001, RADECS 2001. 2001 6th European Conference on Radiation and Its Effects on Components and Systems (Cat. No.01TH8605).

[58]  Farid N. Najm,et al.  Power estimation techniques for FPGAs , 2004 .

[59]  Hideo Ito,et al.  Detecting, diagnosing, and tolerating faults in SRAM-based field programmable gate arrays: a survey , 2003 .

[60]  P. Sundararajan,et al.  Consequences and Categories of SRAM FPGA Configuration SEUs , 2003 .

[61]  W. W. Peterson,et al.  On Codes for Checking Logical Operations , 1959, IBM J. Res. Dev..

[62]  Maryline Chetto,et al.  An adaptive scheduling algorithm for fault-tolerant real-time systems , 1991, Softw. Eng. J..

[63]  Paul Graham,et al.  Evaluation of power costs in applying TMR to FPGA designs. , 2004 .

[64]  Gregory W. Donohoe,et al.  Low-power reconfigurable processor , 2002, Proceedings, IEEE Aerospace Conference.

[65]  E. P. Stabler,et al.  Spacecraft computers for scientific information systems , 1966 .

[66]  Ronald Phelps Operational Experiences with the Petite Amateur Navy Satellite – PANSAT , 2001 .

[67]  M. Caffrey,et al.  Detection of Configuration Memory Upsets Causing Persistent Errors in SRAM-based FPGAs , 2004 .

[68]  M. Wirthlin,et al.  Reconfigurable computing in space: from current technology to reconfigurable systems-on-a-chip , 2003, 2003 IEEE Aerospace Conference Proceedings (Cat. No.03TH8652).

[69]  Michael J. Wirthlin,et al.  Reducing Energy in FPGA Multipliers Through Glitch Reduction - Clock Power and Digit-Serial Addendum , 2005 .

[70]  Michael J. Wirthlin,et al.  SEU mitigation for half-latches in Xilinx Virtex FPGAs , 2003 .

[71]  Luigi Carro,et al.  Reducing pin and area overhead in fault-tolerant FPGA-based designs , 2003, FPGA '03.

[72]  T. R. N. Rao,et al.  Biresidue Error-Correcting Codes for Computer Arithmetic , 1970, IEEE Transactions on Computers.

[73]  Naresh R. Shanbhag,et al.  Reliable low-power digital signal processing via reduced precision redundancy , 2004, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[74]  Bev Littlewood The Use of Proof in Diversity Arguments , 2000, IEEE Trans. Software Eng..

[75]  Paul Graham,et al.  Accelerator validation of an FPGA SEU simulator , 2003 .

[76]  Mile K. Stojcev,et al.  Design of self-checking combinational circuits , 2003, 6th International Conference on Telecommunications in Modern Satellite, Cable and Broadcasting Service, 2003. TELSIKS 2003..

[77]  W. Wei-Ming Dai,et al.  Single-layer fanout routing and routability analysis for ball grid arrays , 1995, Proceedings of IEEE International Conference on Computer Aided Design (ICCAD).

[78]  S. Vadlamani,et al.  Comparison of CORDIC algorithm implementations on FPGA families , 2002, Proceedings of the Thirty-Fourth Southeastern Symposium on System Theory (Cat. No.02EX540).

[79]  S. Mahammad,et al.  Reduced Triple Modular Redundancy for Tolerating SEUs in SRAM-based FPGAs , 2005 .

[80]  Giorgio C. Buttazzo,et al.  Optimal scheduling for fault-tolerant and firm real-time systems , 1998, Proceedings Fifth International Conference on Real-Time Computing Systems and Applications (Cat. No.98EX236).

[81]  Ting Chen,et al.  VLSI implementation of a 16*16 discrete cosine transform , 1989 .

[82]  James Oberg,et al.  Titan Calling , 2004, IEEE Spectrum.

[83]  Asim J. Al-Khalili,et al.  Low power floating point MAFs-a comparative study , 2001, Proceedings of the Sixth International Symposium on Signal Processing and its Applications (Cat.No.01EX467).

[84]  B. Walls,et al.  Evolution of digital signal processing based spacecraft computing solutions , 2001, 20th DASC. 20th Digital Avionics Systems Conference (Cat. No.01CH37219).