A Fault-tolerant Architecture with Error Correcting Code for the Instruction-level Temporal Redundancy

Soft error has become an increasingly significant problem in modern computing systems. To overcome soft errors, it has reported that the instruction-level temporal redundancy in out-of-order cores suffers a performance penalty up to 45%. In this work, we propose the fault-tolerant double execution architecture with the fast error correcting code (such as two-dimensional error code) in the instruction reuse buffer. Experimental results show that it gains back IPC loss between 9.14% and 10.15%, with an average around 9.22% compared with the conventional double execution approach.

[1]  Joel S. Emer,et al.  The soft error problem: an architectural perspective , 2005, 11th International Symposium on High-Performance Computer Architecture.

[2]  Hongjun Dai,et al.  A Framework for the Correction of Multi-Bit Errors in Multi-Core Processors , 2009, 2009 Fourth International Conference on Embedded and Multimedia Computing.

[3]  M. Sachdev,et al.  A multiword based high speed ECC scheme for low-voltage embedded SRAMS , 2008, ESSCIRC 2008 - 34th European Solid-State Circuits Conference.

[4]  Swarup Bhunia,et al.  Reliability-Driven ECC Allocation for Multiple Bit Error Resilience in Processor Cache , 2011, IEEE Transactions on Computers.

[5]  K. Soumyanath,et al.  Scaling trends of cosmic ray induced soft errors in static latches beyond 0.18 /spl mu/ , 2001, 2001 Symposium on VLSI Circuits. Digest of Technical Papers (IEEE Cat. No.01CH37185).

[6]  Anand Sivasubramaniam,et al.  A complexity-effective approach to ALU bandwidth enhancement for instruction-level temporal redundancy , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[7]  J. Yamada,et al.  A submicron 1 Mbit dynamic RAM with a 4-bit-at-a-time built-in ECC circuit , 1984, IEEE Journal of Solid-State Circuits.

[8]  R. Baumann Soft errors in advanced semiconductor devices-part I: the three radiation sources , 2001 .

[9]  Juhee Kim,et al.  Concatenated Reed-Solomon Code with Hamming Code for DRAM Controller , 2010, 2010 Second International Conference on Computer Engineering and Applications.

[10]  Babak Falsafi,et al.  Dual use of superscalar datapath for transient-fault detection and recovery , 2001, MICRO.

[11]  P. Kumar Book reviews - Error control coding; Fundamentals and applications , 1983, IEEE Communications Magazine.

[12]  Janak H. Patel,et al.  Reliability of scrubbing recovery-techniques for memory systems , 1990 .

[13]  Cristina Silvano,et al.  Construction techniques for systematic SEC-DED codes with single byte error detection and partial correction capability for computer memory systems , 1995, IEEE Trans. Inf. Theory.

[14]  Babak Falsafi,et al.  Multi-bit Error Tolerant Caches Using Two-Dimensional Error Coding , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[15]  S. Bates,et al.  Design and Test of a 175-Mb/s, Rate-1/2 (128,3,6) Low-Density Parity-Check Convolutional Code Encoder and Decoder , 2007, IEEE Journal of Solid-State Circuits.

[16]  Pradeep Dubey,et al.  Platform 2015: Intel ® Processor and Platform Evolution for the Next Decade , 2005 .

[17]  Todd M. Austin,et al.  SimpleScalar: An Infrastructure for Computer System Modeling , 2002, Computer.

[18]  Neeraj Suri,et al.  Using Underutilized CPU Resources to Enhance Its Reliability , 2010, IEEE Transactions on Dependable and Secure Computing.

[19]  Manoj Franklin A study of time redundant fault tolerance techniques for superscalar processors , 1995, Proceedings of International Workshop on Defect and Fault Tolerance in VLSI.

[20]  Robert Michael Tanner Fault-Tolerant 256K Memory Designs , 1984, IEEE Transactions on Computers.

[21]  Kees A. Schouhamer Immink,et al.  An efficient decoding strategy of 2D-ECC for optical recording systems , 2009, IEEE Transactions on Consumer Electronics.

[22]  Peter Elias,et al.  Error-free Coding , 1954, Trans. IRE Prof. Group Inf. Theory.

[23]  Ben H. H. Juurlink,et al.  Instruction precomputation with memoization for fault detection , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[24]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .