Implementation of an Experimental Fault-Tolerant Memory System

The experimental fault-tolerant memory system described in this paper has been designed to enable the modular addition of spares, to validate the theoretical fault-secure and self-testing properties of the translator/corrector, to provide a basis for experiments using the new testing and correction processes for recovery, and to determine the practicality of such systems. The hardware design and implementation are described, together with methods of fault insertion. The hardware/ software interface, including a restricted single error correction/double error detection (SEC/DED) code, is specified. Procedures are carefully described which, 1) test for specified physical faults, 2) ensure that single error corrections are not miscorrections due to triple faults, and 3) enable recovery from double errors.