Correlated Failures in Fault-Tolerant Computers

In two repairable ground-based fault-tolerant computer systems in which constraints on switchover time permitted manual switching as a back-up the correlated failures were an important cause of system outage. In one of the systems a distinction could be made between outages that occurred when one computer was undergoing scheduled maintenance and outages that occurred while one computer was being repaired. The failure rate of the active computer was at least four times higher in the latter case. Several possible causes are described but could not be confirmed from the available data. In some situations, correlated failures call for a reliability model different than the commonly described models for imperfect coverage.