A Rigorous Approach to Fault-Tolerant System Development (Extended Abstract)

This paper investigates the issue of what it means for a system to behave correctly despite of hardware fault oeeurrenees. Using a stable storage system as a running example, a framework is presented for specifying, understanding, and verifying the correctness of fault-tolerant systems. A clear separation is made between the notions of software correctness and system reliability in the face of hardware malfunction. Correctness is established by using a programming logic augmented with fault axioms and rules. Stochastie modelling is employed to investigate reliability/availability system properties.