Fault Tolerant Operating Systems

This paper develops four related architectural principles which can guide the construction of error-tolerant operating systems. The fundamental principle, system closure, specifies that no action is permissible unless explicitly authorized. The capability based machine is the most efficient known embodiment of this principle: it allows efficient small access domains, multiple domain processes without a privileged mode of operation, and user and system descriptor information protected by the same mechanism. System closure implies a second principle, resource control, that prevents processes from exchanging information via residual values left in physical resource units. These two principles enable a third, decision verification by failure-independent processes. These principles enable prompt error detection and cost-effective recovery. Implementations of these principles are given for process management, interrupts and traps, store access through capabilities, protected procedure entry, and tagged architecture.

[1]  Elliott I. Organick,et al.  Computer System Organization: The B5700/B6700 Series , 1973 .

[2]  David Lorge Parnas,et al.  A technique for software module specification with examples , 1972, CACM.

[3]  John W. Young,et al.  A first order approximation to the optimum checkpoint interval , 1974, CACM.

[4]  Elliott Irving Organick,et al.  Computer system organization: The B5700/B6700 series (ACM monograph series) , 1973 .

[5]  P. G. Neumann,et al.  A general-purpose file system for secondary storage , 1965, Published in AFIPS '65 (Fall, part I).

[6]  Per Brinch Hansen,et al.  The nucleus of a multiprogramming system , 1970, CACM.

[7]  Gerald J. Popek,et al.  A verifiable protection system , 1975, Reliable Software.

[8]  William A. Wulf,et al.  HYDRA , 1974, Commun. ACM.

[9]  Peter J. Denning,et al.  Third Generation Computer Systems , 1971, CSUR.

[10]  Gerald J. Popek,et al.  The PDP-11 virtual machine architecture: A case study , 1975, SOSP.

[11]  Lawrence Robinson,et al.  A Provably Secure Operating System. , 1975 .

[12]  J. E. Thornton Design of a Computer: The Control Data 6600 , 1970 .

[13]  Steven B. Lipner,et al.  A comment on the confinement problem , 1975, SOSP.

[14]  Robert S. Fabry Dynamic verification of operating system decisions , 1973, CACM.

[15]  Brian Randell,et al.  Process Structuring , 1973, CSUR.

[16]  Edsger W. Dijkstra,et al.  The structure of the “THE”-multiprogramming system , 1968, CACM.

[17]  Robert W. O'Neill,et al.  Experience using a time-shared multi-programming system with dynamic address relocation hardware , 1967, AFIPS '67 (Spring).

[18]  A. Nico Habermann,et al.  Introduction to Operation Systems Design , 1976 .

[19]  Robert S. Fabry,et al.  Capability-based addressing , 1974, CACM.

[20]  J. D. Humphries Time-Sharing Computer Systems , 1969 .

[21]  Robert M. Graham,et al.  Principles of Systems Programming , 1975 .

[22]  William A. Wulf Reliable Hardware/Software Architecture , 1975, IEEE Trans. Software Eng..

[23]  R. Holt Some deadlock properties of computer systems , 1972, OPSR.

[24]  Richard A. Meyer,et al.  A Virtual Machine Time-Sharing System , 1970, IBM Syst. J..

[25]  Brian Randell,et al.  System structure for software fault tolerance , 1975, IEEE Transactions on Software Engineering.

[26]  David Lorge Parnas The influence of software structure on reliability , 1975 .

[27]  Maurice V. Wilkes,et al.  Time-sharing computer systems , 1968 .

[28]  A. Nico Habermann,et al.  Modularization and hierarchy in a family of operating systems , 1976, CACM.

[29]  Jack B. Dennis,et al.  Programming semantics for multiprogrammed computations , 1966, CACM.

[30]  Brian Randell System structure for software fault tolerance , 1975 .

[31]  Elliott I. Organick,et al.  The multics system: an examination of its structure , 1972 .

[32]  Edward A. Feustel,et al.  On The Advantages of Tagged Architecture , 1973, IEEE Transactions on Computers.

[33]  David Jefferson,et al.  Protection in the Hydra Operating System , 1975, SOSP.

[34]  D. L. Parnas,et al.  On the criteria to be used in decomposing systems into modules , 1972, Software Pioneers.

[35]  R. D. H. Walker The Structure of a well-protected computer , 1973 .

[36]  Jerome H. Saltzer,et al.  A hardware architecture for implementing protection rings , 1972, CACM.

[37]  Arthur J. Bernstein,et al.  A Computer Architecture for Level Structured Systems , 1975, IEEE Transactions on Computers.

[38]  Stanley A. Kurzban,et al.  Operating systems principles , 1975 .

[39]  Butler W. Lampson,et al.  A note on the confinement problem , 1973, CACM.

[40]  Jerome H. Saltzer,et al.  The protection of information in computer systems , 1975, Proc. IEEE.

[41]  Richard C. Holt,et al.  Some Deadlock Properties of Computer Systems , 1972, CSUR.

[42]  Thomas H. Bredt,et al.  Error resynchronization in producer-consumer systems , 1975, SOSP.

[43]  Robert P. Goldberg,et al.  Formal requirements for virtualizable third generation architectures , 1973, SOSP 1973.

[44]  Butler W. Lampson,et al.  Dynamic protection structures , 1899, AFIPS '69 (Fall).

[45]  Butler W. Lampson,et al.  Reflections on an operating system design , 1976, CACM.

[46]  Dorothy E. Denning,et al.  A lattice model of secure information flow , 1976, CACM.