论文信息 - Error correction based on hamming distance preserving in arithmetical and logical operations

Error correction based on hamming distance preserving in arithmetical and logical operations

The traditional approach to fault tolerant computing involves replicating computation units and applying a majority vote operation on individual result bits. This approach, however, has several limitations; the most severe is the resource requirement. This paper presents a new method for fault tolerant computing where for a given error rate, the hamming distance between correct inputs and faulty inputs as well as the hamming distance between a correct result and a faulty result is preserved throughout processing; thereby enabling correction of up to transient faults per computation cycle. The new method is compared and contrasted with current protection methods and its cost / performance is analyzed.

[1] M. Anwar Hasan,et al. Low complexity bit parallel architectures for polynomial basis multiplication over GF(2m) , 2004, IEEE Transactions on Computers.

[2] Daniel A. Spielman,et al. Highly fault-tolerant parallel computation , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[3] M. Karpovsky,et al. Robust Correction of Repeating Errors in by Nonlinear Codes , 2009 .

[4] Xuemin Chen,et al. No binary quadratic residue code of length 8m-1 is quasi-perfect , 1994, IEEE Trans. Inf. Theory.

[5] John E. Savage,et al. A framework for coded computation , 2008, 2008 IEEE International Symposium on Information Theory.

[6] Fang Liu,et al. Improving the Fault Tolerance of a Computer System with Space-Time Triple Modular Redundancy , 2006, ESA.

[7] P. Eaton,et al. Soft error rate mitigation techniques for modern microcircuits , 2002, 2002 IEEE International Reliability Physics Symposium. Proceedings. 40th Annual (Cat. No.02CH37320).

[8] Michael Nicolaidis,et al. Carry checking/parity prediction adders and ALUs , 2003, IEEE Trans. Very Large Scale Integr. Syst..

[9] Salvatore Pontarelli,et al. Design of a self checking Reed Solomon encoder , 2005, 11th IEEE International On-Line Testing Symposium.

[10] L. Carro,et al. Analyzing area and performance penalty of protecting different digital modules with Hamming code and triple modular redundancy , 2002, Proceedings. 15th Symposium on Integrated Circuits and Systems Design.

[11] Huapeng Wu,et al. Bit-Parallel Finite Field Multiplier and Squarer Using Polynomial Basis , 2002, IEEE Trans. Computers.

[12] Masahiro Fujita,et al. Low Power and Fault Tolerant Encoding Methods for On-Chip Data Transfer in Practical Applications , 2005, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[13] Nico F. Benschop,et al. Associative Digital Network Theory: An Associative Algebra Approach to Logic, Arithmetic and State Machines , 2009 .

[14] Edward J. McCluskey,et al. Word-voter: a new voter design for triple modular redundant systems , 2000, Proceedings 18th IEEE VLSI Test Symposium.

[15] L. Carro,et al. Designing a radiation hardened 8051-like micro-controller , 2000, Proceedings 13th Symposium on Integrated Circuits and Systems Design (Cat. No.PR00843).

[16] D. A. Bell,et al. Some BCH codes are optimum , 1975 .

[17] Victor P. Nelson. Fault-tolerant computing: fundamental concepts , 1990, Computer.

[18] Xin Li,et al. A Memory Soft Error Measurement on Production Systems , 2007, USENIX Annual Technical Conference.