As the field of fault-tolerant computing is maturing and results from this field are taken into practical use the effects of a failure in a computer system need not be catastrophic. With good fault-detection mechanisms it is now possible to cover a very high percentage of all the possible failures that can occur. Once a fault is detected, systems are designed to reconfigure and proceed either with full or degraded performance depending on how much redundancy is built into the system. It should be noted that one particular failure may have different effects depending on the circumstances and the time at which it occurs.
Today we see that large numbers of resources are being tied together in complex computer systems, either locally or in geographically distributed systems and networks. In such systems it is obviously very undesirable that the failure of one element can bring the entire system down. On the other hand one can usually not afford to design the system with sufficient redundancy to mask the effect of all failures immediately.
[1]
Leonard Kleinrock.
On flow control in computer networks
,
1978
.
[2]
Carl M. Harris,et al.
Fundamentals of queueing theory
,
1975
.
[3]
Leonard Kleinrock,et al.
Power and deterministic rules of thumb for probabilistic problems in computer communications
,
1979
.
[4]
R. Wilkov,et al.
Analysis and Design of Reliable Computer Networks
,
1972,
IEEE Trans. Commun..
[5]
Alfred Giessler,et al.
Free Buffer Allocation - An Investigation by Simulation
,
1978,
Comput. Networks.
[6]
Algirdas Avizienis,et al.
Fault-Tolerant Computing-Progress, Problems and Prospects
,
1977,
IFIP Congress.
[7]
John A. Buzacott.
Markov Approach to Finding Failure Times of Repairable Systems
,
1970
.
[8]
John F. Meyer,et al.
On Evaluating the Performability of Degradable Computing Systems
,
1980,
IEEE Transactions on Computers.