In this work we propose a set of compiler optimizations to identify and remove redundant checks from the replicated code. Two checks are considered redundant if they check the same variable. In this work we evaluate two levels of hardware or system support: memory without support for checkpointing and rollback, where memory is guaranteed to not be corrupted with wrong values and memory with low-cost support for checkpointing and rollback. We also consider the situation where register file is protected with parity or ECC, such as Intel Itanium, Sun UltraSPARC and IBM Power4-6 because software implementations can take advantage of this hardware feature and reduce some of the replicated instructions. We have evaluated our approach using LLVM as our compiler infrastructure and PIN for fault injection. Our experimental results with Spec benchmarks on a Pentium 4 show that in the case where memory is guaranteed not to be corrupted, performance improves by an average 6.2%. With more support for checkpoint performance improves by an average 14.7%. A software fault tolerant system that takes advantage of the register safe platforms improves by an average 16.0%. Fault injection experiments show that our techniques do not decrease fault coverage, although they slightly increase the number of segmentation faults.
[1]
Edward J. McCluskey,et al.
Error detection by duplicated instructions in super-scalar processors
,
2002,
IEEE Trans. Reliab..
[2]
David I. August,et al.
Software-controlled fault tolerance
,
2005,
TACO.
[3]
David I. August,et al.
SWIFT: software implemented fault tolerance
,
2005,
International Symposium on Code Generation and Optimization.
[4]
David I. August,et al.
Automatic Instruction-Level Software-Only Recovery
,
2006,
IEEE Micro.
[5]
David I. August,et al.
Configurable Transient Fault Detection via Dynamic Binary Translation
,
2006
.