Fault-Tolerance CMP Architecture based on SMT Technology

In order to improve the reliability of the single-chip multi-processor (CMP), this paper proposes a fault- tolerant CMP architecture which combines with the simultaneous multi-threading (SMT) technology so as to implement the transient fault detection and to automatically accomplish the thread-level recovery. The architecture, through adopting a simple strategies and a little extra hardware to implement the functionality of fault tolerance, attains a wider coverage of the fault and improves the performance of the fault-tolerant CMP.

[1]  Yixin Chen,et al.  Stream Cube: An Architecture for Multi-Dimensional Analysis of Data Streams , 2005, Distributed and Parallel Databases.

[2]  Beng Chin Ooi,et al.  Multiple aggregations over data streams , 2005, SIGMOD '05.

[3]  Hesham El-Rewini,et al.  Advanced Computer Architecture and Parallel Processing , 2005 .

[4]  Lukasz Golab,et al.  Issues in data stream management , 2003, SGMD.

[5]  Shubhendu S. Mukherjee,et al.  Transient fault detection via simultaneous multithreading , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[6]  Irith Pomeranz,et al.  Transient-fault recovery for chip multiprocessors , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..

[7]  Manoj Franklin A study of time redundant fault tolerance techniques for superscalar processors , 1995, Proceedings of International Workshop on Defect and Fault Tolerance in VLSI.

[8]  Dongsheng Wang,et al.  A Fault-Tolerant Single-Chip Multiprocessor , 2004, Asia-Pacific Computer Systems Architecture Conference.

[9]  Irith Pomeranz,et al.  Transient-fault recovery using simultaneous multithreading , 2002, Proceedings 29th Annual International Symposium on Computer Architecture.

[10]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[11]  Eric Rotenberg,et al.  AR-SMT: a microarchitectural approach to fault tolerance in microprocessors , 1999, Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352).

[12]  Edward J. McCluskey,et al.  Error detection by duplicated instructions in super-scalar processors , 2002, IEEE Trans. Reliab..

[13]  James C. Hoe,et al.  Dual use of superscalar datapath for transient-fault detection and recovery , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.

[14]  Shubhendu S. Mukherjee,et al.  Detailed design and evaluation of redundant multithreading alternatives , 2002, ISCA.

[15]  Hongsong Chen,et al.  MicroThread Based (MTB) coarse grained fault tolerance superscalar processor architecture , 2006 .

[16]  Lorenzo Alvisi,et al.  Modeling the effect of technology trends on the soft error rate of combinational logic , 2002, Proceedings International Conference on Dependable Systems and Networks.