Error correction based on hamming distance preserving in arithmetical and logical operations

The traditional approach to fault tolerant computing involves replicating computation units and applying a majority vote operation on individual result bits. This approach, however, has several limitations; the most severe is the resource requirement. This paper presents a new method for fault tolerant computing where for a given error rate, the hamming distance between correct inputs and faulty inputs as well as the hamming distance between a correct result and a faulty result is preserved throughout processing; thereby enabling correction of up to transient faults per computation cycle. The new method is compared and contrasted with current protection methods and its cost / performance is analyzed.

[1]  M. Anwar Hasan,et al.  Low complexity bit parallel architectures for polynomial basis multiplication over GF(2m) , 2004, IEEE Transactions on Computers.

[2]  Daniel A. Spielman,et al.  Highly fault-tolerant parallel computation , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[3]  M. Karpovsky,et al.  Robust Correction of Repeating Errors in by Nonlinear Codes , 2009 .

[4]  Xuemin Chen,et al.  No binary quadratic residue code of length 8m-1 is quasi-perfect , 1994, IEEE Trans. Inf. Theory.

[5]  John E. Savage,et al.  A framework for coded computation , 2008, 2008 IEEE International Symposium on Information Theory.

[6]  Fang Liu,et al.  Improving the Fault Tolerance of a Computer System with Space-Time Triple Modular Redundancy , 2006, ESA.

[7]  P. Eaton,et al.  Soft error rate mitigation techniques for modern microcircuits , 2002, 2002 IEEE International Reliability Physics Symposium. Proceedings. 40th Annual (Cat. No.02CH37320).

[8]  Michael Nicolaidis,et al.  Carry checking/parity prediction adders and ALUs , 2003, IEEE Trans. Very Large Scale Integr. Syst..

[9]  Salvatore Pontarelli,et al.  Design of a self checking Reed Solomon encoder , 2005, 11th IEEE International On-Line Testing Symposium.

[10]  L. Carro,et al.  Analyzing area and performance penalty of protecting different digital modules with Hamming code and triple modular redundancy , 2002, Proceedings. 15th Symposium on Integrated Circuits and Systems Design.

[11]  Huapeng Wu,et al.  Bit-Parallel Finite Field Multiplier and Squarer Using Polynomial Basis , 2002, IEEE Trans. Computers.

[12]  Masahiro Fujita,et al.  Low Power and Fault Tolerant Encoding Methods for On-Chip Data Transfer in Practical Applications , 2005, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[13]  Nico F. Benschop,et al.  Associative Digital Network Theory: An Associative Algebra Approach to Logic, Arithmetic and State Machines , 2009 .

[14]  Edward J. McCluskey,et al.  Word-voter: a new voter design for triple modular redundant systems , 2000, Proceedings 18th IEEE VLSI Test Symposium.

[15]  L. Carro,et al.  Designing a radiation hardened 8051-like micro-controller , 2000, Proceedings 13th Symposium on Integrated Circuits and Systems Design (Cat. No.PR00843).

[16]  D. A. Bell,et al.  Some BCH codes are optimum , 1975 .

[17]  Victor P. Nelson Fault-tolerant computing: fundamental concepts , 1990, Computer.

[18]  Xin Li,et al.  A Memory Soft Error Measurement on Production Systems , 2007, USENIX Annual Technical Conference.