An Approach for Error Detection and Error Correction in Distributed Systems Computing Numerical Functions

We consider methods of error detection and/or error correction in software and hardware of a distributed system computing values of numerical functions. These methods are based on software and hardware redundancy for the computation of additional check functions. The check functions are easily derived for any given multiplicity of errors. The redundancy does not depend on the number of processors in the original system and depends only on the multiplicity of errors. We describe methods for the construction of optimal checks, required software and hardware redundancy, and implementation of the corresponding error detecting/correcting procedures by a distributed system.