Using Genetic Algorithm to Identify Soft-Error Derating Blocks of an Application Program

Soft-errors are increasingly considered as a major cause for computer system failures. Software techniques are used as cost-effective and flexible techniques to tolerate soft-errors but the introduced overhead is not acceptable in some safety-critical real-time systems. The identification of the program blocks and protecting only vulnerable blocks against soft-errors reduces the performance overhead. In this paper, we present a genetic algorithm to identify the vulnerable program blocks as well as the derating program blocks against soft-errors. Then, only vulnerable blocks are protected by some software-based soft-error tolerance techniques to achieve a lower performance and space overhead. This genetic algorithm is implemented by the C++ programming languages as an automatic tool. To evaluate the algorithm, errors are injected using the Simple scalar toolset. The experimental results indicate that the effectiveness of this method is higher than the previous methods.

[1]  Ravishankar K. Iyer,et al.  An experimental study of soft errors in microprocessors , 2005, IEEE Micro.

[2]  阿部晋树 Fault tolerant computer system , 2005 .

[3]  Edward J. McCluskey,et al.  Error detection by selective procedure call duplication for low energy consumption , 2002, IEEE Trans. Reliab..

[4]  Lorenzo Alvisi,et al.  Modeling the effect of technology trends on the soft error rate of combinational logic , 2002, Proceedings International Conference on Dependable Systems and Networks.

[5]  Alfredo Benso,et al.  Data criticality estimation in software applications , 2003, International Test Conference, 2003. Proceedings. ITC 2003..

[6]  Ethem Alpaydin,et al.  Introduction to machine learning , 2004, Adaptive computation and machine learning.

[7]  Sanjay J. Patel,et al.  Y-branches: when you come to a fork in the road, take it , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.

[8]  Alfredo Benso,et al.  A C/C++ source-to-source compiler for dependable applications , 2000, Proceeding International Conference on Dependable Systems and Networks. DSN 2000.

[9]  Amin Ansari,et al.  Shoestring: probabilistic soft error reliability on the cheap , 2010, ASPLOS 2010.

[10]  Shubhendu S. Mukherjee,et al.  Transient fault detection via simultaneous multithreading , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[11]  Niraj K. Jha,et al.  Fault-tolerant computer system design , 1996, IEEE Parallel & Distributed Technology: Systems & Applications.

[12]  Alan Messer,et al.  Susceptibility of commodity systems and software to memory soft errors , 2004, IEEE Transactions on Computers.

[13]  Johan Karlsson,et al.  Reliability evaluation of a fault-tolerant computer for a multi-phased mission and use of heavy-ion radiation for fault injection experiments , 1990 .

[14]  Janusz Sosnowski,et al.  Transient fault tolerance in digital systems , 1994, IEEE Micro.

[15]  Ying C. Yeh Design considerations in Boeing 777 fly-by-wire computers , 1998, Proceedings Third IEEE International High-Assurance Systems Engineering Symposium (Cat. No.98EX231).

[16]  Edward J. McCluskey,et al.  Error detection by duplicated instructions in super-scalar processors , 2002, IEEE Trans. Reliab..

[17]  Mahdi Fazeli,et al.  A software-based concurrent error detection technique for power PC processor-based embedded systems , 2005, 20th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (DFT'05).

[18]  Yale N. Patt,et al.  Improving branch prediction by understanding branch behavior , 2000 .

[19]  David A. Watt,et al.  Programming language design concepts , 2004 .

[20]  Marco Torchiano,et al.  Soft-error detection through software fault-tolerance techniques , 1999, Proceedings 1999 IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (EFT'99).

[21]  Mahmut T. Kandemir,et al.  Compiler-directed instruction duplication for soft error detection , 2005, Design, Automation and Test in Europe.

[22]  Frederic T. Chong,et al.  Characterization of Error-Tolerant Applications when Protecting Control Data , 2006, 2006 IEEE International Symposium on Workload Characterization.

[23]  Dawson R. Engler,et al.  Bugs as deviant behavior: a general approach to inferring errors in systems code , 2001, SOSP.

[24]  Andreas Steininger,et al.  On finding an optimal combination of error detection mechanisms based on results of fault injection experiments , 1997, Proceedings of IEEE 27th International Symposium on Fault Tolerant Computing.

[25]  Heidrun Engel,et al.  Data flow transformations to detect results which are corrupted by hardware faults , 1996, Proceedings. IEEE High-Assurance Systems Engineering Workshop (Cat. No.96TB100076).

[26]  Timothy J. Slegel,et al.  IBM's S/390 G5 microprocessor design , 1999, IEEE Micro.

[27]  Edward J. McCluskey,et al.  Control-flow checking by software signatures , 2002, IEEE Trans. Reliab..

[28]  Johan Karlsson,et al.  Two software techniques for on-line error detection , 1992, [1992] Digest of Papers. FTCS-22: The Twenty-Second International Symposium on Fault-Tolerant Computing.

[29]  Ben H. H. Juurlink,et al.  Protective redundancy overhead reduction using instruction vulnerability factor , 2010, Conf. Computing Frontiers.

[30]  A. Jefferson Offutt,et al.  Introduction to Software Testing , 2008 .

[31]  Marco Torchiano,et al.  A source-to-source compiler for generating dependable software , 2001, Proceedings First IEEE International Workshop on Source Code Analysis and Manipulation.