Architectural concepts and redundancy techniques in fault-tolerant computers

This paper presents a description of redundancy techniques employed in the design of fault-tolerant computers, and a discussion of the effects of functional requirements, technology constraints, and cost considerations which enter into the choice of these techniques. The STAR computer, developed at the Jet Propulsion Laboratory for long-duration planetary spacecraft missions, is discussed along with several later fault-tolerant computer designs. The class of computers described in this paper employs dynamic redundancy, i.e., the machine is divided into a set of submodules, each with standby spares; a special hard core monitor unit detects and diagnoses faults, and effects automated recovery by replacing failed parts.