A general method for introducing fault-tolerance in a hierarchical operating system is presented here. First, a hierarchically structured conventional (non-fault-tolerant) operating system is described. In order to transform it into a fault-tolerant system, each conventional machine is augmented with an Error Detection and Recovery (EDR) mechanism, thus obtaining a corresponding fault-tolerant machine. It is determined that, from the standpoint of fault-tolerance, three types of machines can be identified: physical, kernel, and process type. The EDR mechanism makes a conventional machine fault-tolerant by transforming its conventional operations into fault-tolerant operations. To provide this transformation, a set of operations are defined for the EDR mechanism. A model for fault-tolerant operations is developed, such that known techniques for fault-tolerance (e.g. recovery block, N-version programming, etc.) can be represented as particular cases. The general fault-tolerant operating system obtained is a hierarchy of fault-tolerant machines, with the physical type machines at the bottom, followed by the kernel type machines above them, and the process type as the upper machines.
[1]
Brian Randell.
System structure for software fault tolerance
,
1975
.
[2]
P. M. Melliar-Smith,et al.
A program structure for error detection and recovery
,
1974,
Symposium on Operating Systems.
[3]
Brian Randell.
Reliable Computing Systems
,
1978,
Advanced Course: Operating Systems.
[4]
Brian Randell,et al.
Computing Systems Reliability
,
1979
.
[5]
A. Nico Habermann,et al.
Modularization and hierarchy in a family of operating systems
,
1976,
CACM.
[6]
Santosh K. Shrivastava,et al.
A Model of Recoverability in Multilevel Systems
,
1978,
IEEE Transactions on Software Engineering.
[7]
Marius Doru Soneriu.
A methodology for the design and analysis of fault-tolerant operating systems
,
1981
.
[8]
Peter J. Denning,et al.
Fault Tolerant Operating Systems
,
1976,
CSUR.
[9]
P. A. Lee,et al.
The Provision of Recoverable Interfaces
,
1985
.
[10]
Barbara Liskov.
The design of the Venus operating system
,
1972,
CACM.