Real-Number Codes for Bault-Tolerant Matrix Operations On Processor Arrays

A generalization of existing real numer codes is proposed. It is proven that linearity is a necessary and sufficient condition for codes used for fault-tolerant matrix operations such as matrix addition, multiplication, transposition, and LU decomposition. It is also proven that for every linear code defined over a finite field, there exists a corresponding linear real-number code with similar error detecting capabilities. Encoding schemes are given for some of the example codes which fall under the general set of real-number codes. With the help of experiments, a rule is derived for the selection of a particular code for a given application. The performance overhead of fault tolerance schemes using the generalized encoding schemes is shown to be very low, and this is substantiated through simulation experiments. >

[1]  Jacob A. Abraham,et al.  Fault Tolerance Techniques For Highly Parallel Signal Processing Architectures , 1986, Photonics West - Lasers and Applications in Science and Engineering.

[2]  Jacob A. Abraham,et al.  Fault-Tolerant Systems For The Computation Of Eigenvalues And Singular Values , 1986, Optics & Photonics.

[3]  T. Marshall,et al.  Coding of Real-Number Sequences for Error Correction: A Digital Signal Processing Problem , 1984, IEEE J. Sel. Areas Commun..

[4]  W. Greub Linear Algebra , 1981 .

[5]  Jacob A. Abraham,et al.  Fault-Tolerant FFT Networks , 1988, IEEE Trans. Computers.

[6]  Franklin T. Luk,et al.  Fault-Tolerant Matrix Triangularizations on Systolic Arrays , 1988, IEEE Trans. Computers.

[7]  Wolfgang Rönsch Stability aspects in using parallel algorithms , 1984, Parallel Comput..

[8]  J.L. Massey,et al.  Theory and practice of error control codes , 1986, Proceedings of the IEEE.

[9]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[10]  Suku Nair,et al.  Algorithm-Based Fault Tolerance on a Hypercube Multiprocessor , 1990, IEEE Trans. Computers.

[11]  Jacob A. Abraham,et al.  Algorithm-Based Fault Tolerance for Matrix Operations , 1984, IEEE Transactions on Computers.

[12]  Ulrich Kulisch,et al.  Features of a hardware implementation of an optimal arithmetic , 1983 .

[13]  John F. Wakerly,et al.  Error detecting codes, self-checking circuits and applications , 1978 .

[14]  Suku Nair,et al.  General linear codes for fault-tolerant matrix operations on processor arrays , 1988, [1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[15]  J. Reid Large Sparse Sets of Linear Equations , 1973 .

[16]  Janak H. Patel,et al.  Concurrent Error Detection in ALU's by Recomputing with Shifted Operands , 1982, IEEE Transactions on Computers.

[17]  Jacob A. Abraham,et al.  Fault-secure algorithms for multiple-processor systems , 1984, ISCA '84.

[18]  Bella Bose,et al.  Theory of Unidirectional Error Correcting/Detecting Codes , 1978, IEEE Transactions on Computers.

[19]  Efstratios Gallopoulos Processor arrays for problems in computational physics (parallel) , 1985 .

[20]  John Leonard Larson,et al.  Methods for automatic error analysis of numerical algorithms. , 1978 .

[21]  J. Douglas Faires,et al.  Numerical Analysis , 1981 .

[22]  W. W. Peterson,et al.  Error-Correcting Codes. , 1962 .

[23]  Jacob A. Abraham,et al.  Fault-Tolerant Algorithms and Architectures for Real Time Signal Processing , 1988, ICPP.

[24]  J.A. Abraham,et al.  Fault-tolerant matrix arithmetic and signal processing on highly concurrent computing structures , 1986, Proceedings of the IEEE.

[25]  J. H. Wilkinson The algebraic eigenvalue problem , 1966 .

[26]  William Kahan,et al.  Pracniques: further remarks on reducing truncation errors , 1965, CACM.

[27]  Ahmed H. Sameh,et al.  Efficient Calculation of the Effects of Roundoff Errors , 1978, TOMS.

[28]  A. Avizienis,et al.  Fault-tolerance: The survival attribute of digital systems , 1978, Proceedings of the IEEE.