Exploiting Slack for Low Overhead Soft Error Reliability

Designing low overhead mechanisms for improving soft error reliability will be a key requirement at future technologies. The slack of an instruction is the number of cycles it can be delayed without extending the execution time. We make the observation that while the slack cycles do not affect execution time, they add to the total number of program cycles vulnerable to soft errors. In this paper, we show that exploiting slack to reduce this component of vulnerable cycles, has significant potential to improve soft error reliability. We explore two different mechanisms for exploiting slack, which reduce reduce soft error rate by 34-42%, at a performance overhead of only 610%. We also demonstrate that while redundant execution incurs high performance overhead, these techniques can efficiently adapt to the amount of slack available in different programs to achieve reliability improvement with minimum performance overhead.

[1]  Rastislav Bodík,et al.  Slack: maximizing performance under technological constraints , 2002, ISCA.

[2]  S. Vangal,et al.  Selective node engineering for chip-level soft error rate improvement [in CMOS] , 2002, 2002 Symposium on VLSI Circuits. Digest of Technical Papers (Cat. No.02CH37302).

[3]  Xia Chen,et al.  A spatial path scheduling algorithm for EDGE architectures , 2006, ASPLOS XII.

[4]  Jaehyuk Huh,et al.  Exploiting ILP, TLP, and DLP with the Polymorphous TRIPS Architecture , 2003, IEEE Micro.

[5]  Joel S. Emer,et al.  Techniques to reduce the soft error rate of a high-performance microprocessor , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[6]  T. N. Vijaykumar,et al.  Opportunistic Transient-Fault Detection , 2006, IEEE Micro.

[7]  Alvin R. Lebeck,et al.  Load latency tolerance in dynamically scheduled processors , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[8]  Shubhendu S. Mukherjee,et al.  Detailed design and evaluation of redundant multithreading alternatives , 2002, ISCA.

[9]  Sanjay J. Patel,et al.  Instruction fetch deferral using static slack , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[10]  Joel Emer,et al.  A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..