Fault-tolerant computer systems

The paper reviews the methods by which reliable processing and control can be achieved using fault-tolerant digital computers. The motivation for employing such systems is discussed, together with an indication of current and potential areas of application. The features of fault-tolerant computers are described in general terms, together with a system design procedure specific to the development of a reliable computer. The adherence to a well structured design methodology is particularly important in a fault-tolerant computer to ensure an initially fault-free system. In order to follow this design procedure an intimate knowledge of the following subject areas is required: fault classification, redundancy techniques and their relative merits and reliability modelling and analysis. These topics are covered in the paper, with particular emphasis being placed upon the implementation of hardware, software, data and time-redundancy techniques. Examples of fault-tolerant computers proposed and produced in the last decade are described. The availability of large scale integrated circuits (LSI) and in particular microprocessors will have a profound effect on the development and application of fault-tolerant computers. The implications of using LSI in this area are therefore discussed including a brief description of two fundamentally different approaches to the realisation of a fault tolerant microcomputer.