An efficient approach to detect and correct control-flow errors for linear assembly

In the space environment, a large number of cosmic rays often results in transient faults on on-board computers. And one of the main problems caused by these faults is the control flow errors in the program. This paper proposes a software-implemented control flow error detecting and correcting approach for the linear assembly named DCCLA. DCCLA firstly divides the program into loop blocks and non-loop blocks and assigns the formatted labels for the blocks. Then based on the mechanism of instructions counting, DCCLA inserts counting and comparing instructions into every block, with the purpose of detecting and correcting the control flow errors occurred inter-block and intra-block. In order to correct the data flow errors caused by the control flow errors, DCCLA backups the loop state and live variables. One advantage of DCCLA is that it can be configured flexibly according to the requirement of reliability and performance. The results of fault injection experiment shown that, the average fail rate of programs with DCCLA has decreased to 4.25% with the cost of increasing the average executing time by 41.7% and increasing the average program space by 46.7% DCCLA has the least influence on performance and space overhead and correspondingly higher reliability among three typical control flow detecting algorithms.

[1]  Bogdan Nicolescu,et al.  Detecting Soft Errors by a Purely Software Approach: Method, Tools and Experimental Results , 2003, DATE.

[2]  Hamid R. Zarandi,et al.  Two Efficient Software Techniques to Detect and Correct Control-Flow Errors , 2010, 2010 IEEE 16th Pacific Rim International Symposium on Dependable Computing.

[3]  Tan Qingping,et al.  An Extendable Control Flow Checking Method Based on Formatted Signatures , 2011 .

[4]  Y. Savaria,et al.  SIED: software implemented error detection , 2003, Proceedings 18th IEEE Symposium on Defect and Fault Tolerance in VLSI Systems.

[5]  Bernie Mulgrew,et al.  IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems , 1998 .

[6]  David I. August,et al.  Software-controlled fault tolerance , 2005, TACO.

[7]  Y. Savaria,et al.  Software detection mechanisms providing full coverage against single bit-flip faults , 2004, IEEE Transactions on Nuclear Science.

[8]  Massimo Violante,et al.  Soft-error detection using control flow assertions , 2003, Proceedings 18th IEEE Symposium on Defect and Fault Tolerance in VLSI Systems.

[9]  Jacob A. Abraham,et al.  ACCE: Automatic correction of control-flow errors , 2007, 2007 IEEE International Test Conference.

[10]  Suku Nair,et al.  Design and Evaluation of System-Level Checks for On-Line Control Flow Error Detection , 1999, IEEE Trans. Parallel Distributed Syst..

[11]  Hamid R. Zarandi,et al.  CCDA: Correcting control-flow and data errors automatically , 2010, 2010 15th CSI International Symposium on Computer Architecture and Digital Systems.

[12]  David I. August,et al.  SWIFT: software implemented fault tolerance , 2005, International Symposium on Code Generation and Optimization.

[13]  QingPing Tan,et al.  Automatic instruction-level recovery by duplicated instructions and checkpointing , 2012, 2012 5th International Conference on BioMedical Engineering and Informatics.

[14]  Edward J. McCluskey,et al.  Control-flow checking by software signatures , 2002, IEEE Trans. Reliab..

[15]  Lorenzo Alvisi,et al.  Modeling the effect of technology trends on the soft error rate of combinational logic , 2002, Proceedings International Conference on Dependable Systems and Networks.