System-level Fault-Tolerance Analysis of Small Satellite On-Board Computers

Commercial Off-The-Shelf (COTS) electronic components offer cost-effective solutions for the development of On-Board Computers (OBCs) in the small satellite industry. However, the COTS parts are not originally designed to withstand the space radiation environment. Traditional fault-tolerance practices rely on expensive radiation tests or are based on circuit-level knowledge which are not easily available. This work proposes a novel simulation-based statistical approach to assist the satellite designers in performing OBC fault-tolerance analysis. The presented novel approach is based on high-level system modeling and an object-oriented fault injection mechanism. Such a technique allows the comparison between fault-tolerance techniques and reveals the consequences of radiation effects in the COTS parts at early development stages. The work covers the implementation of the proposed simulation framework which includes the OBC and fault modeling. The fault models are based on the conducted radiation environment analysis. The range of software and hardware fault detection and mitigation techniques are investigated as case studies. They include time and hardware Triple-Modular Redundancy, FPGA-based memory scrubbing with Hamming encoding, and watchdog/co-processor monitoring. The case studies reveal that the proposed approach can be used to choose suitable fault-tolerance techniques, increase their efficiency, and reduce the required hardware resources. Three papers are included: - SystemC-based On-board Computer Modeling for Design Fault-Tolerance Assessment - A Simulator of On-Board Computers for Evaluating Fault-Mitigation Techniques - System Fault-tolerance Analysis of Small Satellite On-board Computers

[1]  P. Reviriego,et al.  A Simulation Platform for the Study of Soft Errors on Signal Processing Circuits through Software Fault Injection , 2007, 2007 IEEE International Symposium on Industrial Electronics.

[2]  Krzysztof Iniewski Radiation Effects in Semiconductors , 2010 .

[3]  Demid Borodin,et al.  Performance-Oriented Fault Tolerance in Computing Systems , 2010 .

[4]  Chung-Hsien Hsu,et al.  SoC-level risk assessment using FMEA approach in system design with SystemC , 2009, 2009 IEEE International Symposium on Industrial Embedded Systems.

[5]  Vishwani D. Agrawal,et al.  Essentials of electronic testing for digital, memory, and mixed-signal VLSI circuits [Book Review] , 2000, IEEE Circuits and Devices Magazine.

[6]  Philippe Roche,et al.  SEE test and modeling results on 45nm SRAMs with different well strategies , 2010, 2010 IEEE International Reliability Physics Symposium.

[7]  D.N. Nguyen,et al.  Single Event Effect Characterization of High Density Commercial NAND and NOR Nonvolatile Flash Memories , 2007, IEEE Transactions on Nuclear Science.

[8]  Niccolò Battezzati,et al.  A new software tool for static analysis of SET sensitiveness in Flash-based FPGAs , 2010, 2010 18th IEEE/IFIP International Conference on VLSI and System-on-Chip.

[9]  Leena Singh,et al.  Advanced Verification Techniques: A SystemC Based Approach for Successful Tapeout , 2004 .

[10]  E. Ibe,et al.  Installation and application of an intense 7Li(p,n) neutron source for 20-90 MeV region. , 2007, Radiation protection dosimetry.

[11]  Niccolò Battezzati,et al.  On the Evaluation of Radiation-Induced Transient Faults in Flash-Based FPGAs , 2008, 2008 14th IEEE International On-Line Testing Symposium.

[12]  Riccardo Mariani,et al.  Using an innovative SoC-level FMEA methodology to design in compliance with IEC61508 , 2007, 2007 Design, Automation & Test in Europe Conference & Exhibition.

[13]  H.S. Kim,et al.  SEE and TID Characterization of an Advanced Commercial 2Gbit NAND Flash Nonvolatile Memory , 2006, IEEE Transactions on Nuclear Science.

[14]  Rainer Sandau,et al.  Small Satellites for Earth Observation , 2011 .

[15]  Teruo Fujiwara,et al.  A high assurance on-line recovery technology for a space on-board computer , 2001, Proceedings 5th International Symposium on Autonomous Decentralized Systems.

[16]  Pedro J. Gil,et al.  On benchmarking the dependability of automotive engine control applications , 2004, International Conference on Dependable Systems and Networks, 2004.

[17]  Wei Qin,et al.  Prototyping a fault-tolerant multiprocessor SoC with run-time fault recovery , 2006, 2006 43rd ACM/IEEE Design Automation Conference.

[18]  Xiaofeng Wu,et al.  On-Board Partial Run-Time Reconfiguration for Pico-Satellite Constellations , 2006, First NASA/ESA Conference on Adaptive Hardware and Systems (AHS'06).

[19]  K. Anderson Low-Cost, Radiation-Tolerant, On-Board Processing Solution , 2005, 2005 IEEE Aerospace Conference.

[20]  Ricardo Reis,et al.  An On-board Data-Handling Computer for Deep-Space Exploration Built Using Commercial-Off-the-Shelf SRAM-Based FPGAs , 2009, 2009 24th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems.

[21]  Lei Xing,et al.  FPGA On-Board Computer design based on hierarchical fault tolerance , 2008, 2008 2nd International Symposium on Systems and Control in Aerospace and Astronautics.

[22]  Frank Ghenassia Transaction-Level Modeling with SystemC: TLM Concepts and Applications for Embedded Systems , 2010 .

[23]  W. Kent Fuchs,et al.  CATCH-compiler-assisted techniques for checkpointing , 1990, [1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium.

[24]  J.W. Howard,et al.  Total dose and single event effects testing of the Intel pentium III (P3) and AMD K7 microprocessors , 2001, 2001 IEEE Radiation Effects Data Workshop. NSREC 2001. Workshop Record. Held in conjunction with IEEE Nuclear and Space Radiation Effects Conference (Cat. No.01TH8588).

[25]  Mohamed Mahmoud Ibrahim,et al.  FPGA based on board computer for LEO satellites , 2011, Proceeding of the 2011 IEEE International Conference on Space Science and Communication (IconSpace).

[26]  Kun-Jun Chang,et al.  System-Level fault Injection in System Design Platform , 2007 .

[27]  Toshinori Kuwahara,et al.  FPGA-based reconfigurable on-board computing systems for space applications , 2010 .

[28]  C. Loan Computational Frameworks for the Fast Fourier Transform , 1992 .

[29]  S. Rezgui,et al.  New Methodologies for SET Characterization and Mitigation in Flash-Based FPGAs , 2007, IEEE Transactions on Nuclear Science.

[30]  Jordi Puig-Suari,et al.  CubeSat: The Development and Launch Support Infrastructure for Eighteen Different Satellite Customers on One Launch , 2001 .

[31]  Niccolò Battezzati,et al.  Soft errors in Flash-based FPGAs: Analysis methodologies and first results , 2009, 2009 International Conference on Field Programmable Logic and Applications.

[32]  Dong-Soo Kang,et al.  Design and implementation of a radiation tolerant on-board computer for science technology satellite-3 , 2010, 2010 NASA/ESA Conference on Adaptive Hardware and Systems.

[33]  Brian Randell,et al.  System structure for software fault tolerance , 1975, IEEE Transactions on Software Engineering.

[34]  Ari Virtanen,et al.  Comparison of TID response and SEE characterization of single- and multi-level high density NAND flash memories , 2009, 2009 European Conference on Radiation and Its Effects on Components and Systems.

[35]  M. Namjoo,et al.  WATCHDOG PROCESSORS AND CAPABILITY CHECKING , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing, 1995, ' Highlights from Twenty-Five Years'..

[36]  Luca Fossati,et al.  A Framework for Reliability Assessment and Enhancement in Multi-Processor Systems-On-Chip , 2007, 22nd IEEE International Symposium on Defect and Fault-Tolerance in VLSI Systems (DFT 2007).

[37]  M. Friendlich,et al.  Effect of Radiation Exposure on the Endurance of Commercial nand Flash Memory , 2009, IEEE Transactions on Nuclear Science.

[38]  James R. Wertz,et al.  Mission geometry; orbit and constellation design and management , 2001 .

[39]  Chung-Hsien Hsu,et al.  Analysis of system bus transaction vulnerability in systemC TLM design platform , 2009 .

[40]  R. G. Useinov,et al.  Methodology of Soft Error Rate Computation in Modern Microelectronics , 2009, IEEE Transactions on Nuclear Science.

[41]  Luigi Carro,et al.  Designing fault tolerant systems into SRAM-based FPGAs , 2003, Proceedings 2003. Design Automation Conference (IEEE Cat. No.03CH37451).

[42]  T. Yamada,et al.  In-orbit experiment on the fault-tolerant space computer aboard the satellite Hiten , 1996, IEEE Trans. Reliab..

[43]  Jong-In Lee,et al.  Design of a New On-Board Computer for the New KOMPSAT Bus , 2005, 2005 IEEE Aerospace Conference.

[44]  Nobuyasu Kanekawa,et al.  Dependability in Electronic Systems: Mitigation of Hardware Failures, Soft Errors, and Electro-Magnetic Disturbances , 2010 .

[45]  William H. Sanders,et al.  Low-Cost Error Containment and Recovery for Onboard Guarded Software Upgrading and Beyond , 2002, IEEE Trans. Computers.

[46]  Eli T. Fathi,et al.  A Fault-Tolerant Multimicroprocessor-Based Computer System for Space-Based Signal Processing , 1984, IEEE Micro.

[47]  S. Rezgui,et al.  New Reprogrammable and Non-Volatile Radiation Tolerant FPGA: RTA3P , 2008, 2008 IEEE Aerospace Conference.

[48]  Peter J. McNulty,et al.  Charged particles cause microelectronics malfunction in space , 1983 .

[49]  M. Duma,et al.  A New Generation On-Board Computer and Solid State Data Recorder suitable for SpaceWire Platforms , 2007, 2007 3rd International Conference on Recent Advances in Space Technologies.

[50]  E. Normand,et al.  Heavy Ion, High-Energy, and Low-Energy Proton SEE Sensitivity of 90-nm RHBD SRAMs , 2010, IEEE Transactions on Nuclear Science.

[51]  J. J. Wang,et al.  Radiation effects in FPGAs , 2003 .

[52]  R.K. Lawrence Radiation Characterization of 512Mb SDRAMs , 2007, 2007 IEEE Radiation Effects Data Workshop.

[53]  Christian Steger,et al.  High level fault injection for attack simulation in smart cards , 2004, 13th Asian Test Symposium.

[54]  Chris Miller,et al.  Trends in radiation susceptibility of commercial DRAMs for space systems , 2009, 2009 IEEE Aerospace conference.

[55]  C. Carmichael,et al.  SEU mitigation testing of Xilinx Virtex II FPGAs , 2003, 2003 IEEE Radiation Effects Data Workshop.

[56]  Georg Georgakos,et al.  Soft Error Rates in 65nm SRAMs--Analysis of new Phenomena , 2007, 13th IEEE International On-Line Testing Symposium (IOLTS 2007).

[57]  G.M. Swift,et al.  Single Event Effects Test Results for Advanced Field Programmable Gate Arrays , 2006, 2006 IEEE Radiation Effects Data Workshop.

[58]  R. Baumann The impact of technology scaling on soft error rate performance and limits to the efficacy of error correction , 2002, Digest. International Electron Devices Meeting,.

[59]  Dean A. Ebert,et al.  Configurable Fault-Tolerant Processor (CFTP) for Space Based Applications , 2003 .

[60]  R. Chipana,et al.  TID in Flash-Based FPGA: Power Supply-Current Rise and Logic Function Mapping Effects in Propagation-Delay Degradation , 2011, IEEE Transactions on Nuclear Science.

[61]  Allan H. Johnston,et al.  Radiation effects on advanced flash memories , 1999 .

[62]  D. L. Shaeffer,et al.  Operation of commercial R3000 processors in the Low Earth Orbit (LEO) space environment , 1991 .

[63]  Radiation-Tolerant ProASIC3 FPGAs Radiation Effects , 2003 .

[64]  J. J. Beahan,et al.  Radiation fault modeling and fault rate estimation for a COTS based space-borne supercomputer , 2002, Proceedings, IEEE Aerospace Conference.

[65]  S.M. Guertin,et al.  Radiation Tests on 2Gb NAND Flash Memories , 2006, 2006 IEEE Radiation Effects Data Workshop.

[66]  Yung-Yuan Chen,et al.  SoC-level fault injection methodology in SystemC design platform , 2008, 2008 Asia Simulation Conference - 7th International Conference on System Simulation and Scientific Computing.