RI-COTS: Trading performance for reliability improvements in commercial of the shelf systems

The flexibility of software-based fault tolerant approaches in providing the required level of reliability Commer-cial-Off-The Shelf (COTS) devices made them the first choice in designing safety-critical systems. In this paper, we propose a reliability improvement method for COTS-based systems, so-called, RI-COTS. The main idea behind RI-COTS is to establish a tradeoff between reliability and performance of COTS system through controlling redundant execution at instruction level. RI-COTS is implemented on LEON2 processor VHDL model. Our simulation results show that comparing with the most related studies, RI-COTS can improve the fault detection capability by 20% with only 4% performance overhead.

[1]  David R. Kaeli,et al.  Quantifying software vulnerability , 2008, WREFT '08.

[2]  David I. August,et al.  Automatic Instruction-Level Software-Only Recovery , 2006, IEEE Micro.

[3]  Sied Mehdi Fakhraie,et al.  Fast and accurate architectural vulnerability analysis for embedded processors using Instruction Vulnerability Factor , 2016, Microprocess. Microsystems.

[4]  Michael Engel,et al.  The Reliable Computing Base - A Paradigm for Software-based Reliability , 2012, GI-Jahrestagung.

[5]  Gabriel Parmer,et al.  C'Mon: a predictable monitoring infrastructure for system-level latent fault detection and recovery , 2015, 21st IEEE Real-Time and Embedded Technology and Applications Symposium.

[6]  Hosein Mohammadi Makrani,et al.  Evaluation of Software-Based Fault-Tolerant Techniques on Embedded OS ’ s Components , 2014 .

[7]  Pedro Reviriego,et al.  A Scheme to Improve the Intrinsic Error Detection of the Instruction Set Architecture , 2017, IEEE Computer Architecture Letters.

[8]  Dhiraj K. Pradhan,et al.  Software Modification Aided Transient Error Tolerance for Embedded Systems , 2013, 2013 Euromicro Conference on Digital System Design.

[9]  Edward J. McCluskey,et al.  ED4I: Error Detection by Diverse Data and Duplicated Instructions , 2002, IEEE Trans. Computers.

[10]  Edward J. McCluskey,et al.  Error detection by duplicated instructions in super-scalar processors , 2002, IEEE Trans. Reliab..

[11]  David I. August,et al.  Software modulated fault tolerance , 2008 .

[12]  Raoul Velazco,et al.  A Survey on Fault Injection Techniques , 2004, Int. Arab J. Inf. Technol..

[13]  Muhammad Shafique,et al.  Reliable software for unreliable hardware: Embedded code generation aiming at reliability , 2011, 2011 Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[14]  Sied Mehdi Fakhraie,et al.  Vulnerability Analysis for Custom Instructions , 2012, 2012 15th Euromicro Conference on Digital System Design.

[15]  Prabhat Mishra,et al.  Reliability and energy-aware cache reconfiguration for embedded systems , 2016, 2016 17th International Symposium on Quality Electronic Design (ISQED).

[16]  Mahdi Fazeli,et al.  FTSPM: A Fault-Tolerant ScratchPad Memory , 2013, 2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[17]  Antonio Martínez-Álvarez,et al.  Compiler-Directed Soft Error Mitigation for Embedded Systems , 2012, IEEE Transactions on Dependable and Secure Computing.

[18]  Massimo Violante,et al.  A New Approach to Software-Implemented Fault Tolerance , 2004, J. Electron. Test..

[19]  Wei Zhang,et al.  Compiler-guided register reliability improvement against soft errors , 2005, EMSOFT.

[20]  Aviral Shrivastava,et al.  Systematic Methodology for the Quantitative Analysis of Pipeline-Register Reliability , 2017, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[21]  Marco Torchiano,et al.  Soft-error detection through software fault-tolerance techniques , 1999, Proceedings 1999 IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (EFT'99).

[22]  David I. August,et al.  SWIFT: software implemented fault tolerance , 2005, International Symposium on Code Generation and Optimization.

[23]  Joel Emer,et al.  A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[24]  Shuai Wang,et al.  On the characterization and optimization of system-level vulnerability for instruction caches in embedded processors , 2015, Microprocess. Microsystems.

[25]  Ben H. H. Juurlink,et al.  Protective redundancy overhead reduction using instruction vulnerability factor , 2010, Conf. Computing Frontiers.

[26]  Seyed Ghassem Miremadi,et al.  A data recomputation approach for reliability improvement of scratchpad memory in embedded systems , 2014, 2014 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT).

[27]  Farshad Firouzi,et al.  Instruction reliability analysis for embedded processors , 2010, 13th IEEE Symposium on Design and Diagnostics of Electronic Circuits and Systems.

[28]  Aviral Shrivastava,et al.  InCheck: An in-application recovery scheme for soft errors , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).

[29]  Stamatis Vassiliadis,et al.  Instruction-Level Fault Tolerance Configurability , 2009, J. Signal Process. Syst..

[30]  Edward J. McCluskey,et al.  Error detection by selective procedure call duplication for low energy consumption , 2002, IEEE Trans. Reliab..

[31]  Narayanan Vijaykrishnan,et al.  Reliability concerns in embedded system designs , 2006, Computer.

[32]  Nahid Farhady Ghalaty,et al.  Software-based control flow error detection and correction using branch triplication , 2011, 2011 IEEE 17th International On-Line Testing Symposium.