Reliable Computer Systems

At present, the fault tolerance community is hampered by using a set of conflicting terms to refer to closely related fault tolerance concepts. This paper presents informal, but precise, definitions and terminology for these concepts. In particular, the terms fault, error and failure are carefully defined and distinguished. The aim is to promote discussion in the hope that an agreed terminology will emerge.

[1]  Leslie Lamport,et al.  Proving Liveness Properties of Concurrent Programs , 1982, TOPL.

[2]  Algirdas Avizienis Fault-Tolerant Systems , 1976, IEEE Trans. Computers.

[3]  Flaviu Cristian,et al.  Systematic Detection of Exception Occurrences , 1981, Sci. Comput. Program..

[4]  Santosh K. Shrivastava,et al.  Concurrent Pascal with backward error recovery: Implementation , 1979, Softw. Pract. Exp..

[5]  Santosh K. Shrivastava,et al.  Fault-Tolerant Sequential Programming Using Recovery Blocks , 1985 .

[6]  Brian A. Wichmann,et al.  Rationale for the design of the Ada programming language , 1979, SIGP.

[7]  Joseph E. Stoy,et al.  Denotational Semantics: The Scott-Strachey Approach to Programming Language Theory , 1981 .

[8]  Edsger W. Dijkstra,et al.  A Discipline of Programming , 1976 .

[9]  Eugene Wong,et al.  Introduction to a system for distributed databases (SDD-1) , 1980, TODS.

[10]  H. Hecht,et al.  Fault-Tolerant Software for Real-Time Applications , 1976, CSUR.

[11]  Irving L. Traiger,et al.  On the notions of consistency and predicate locks in a relational database system" cacm , 1976 .

[12]  Flaviu Cristian,et al.  Exception Handling and Software Fault Tolerance , 1982, IEEE Transactions on Computers.

[13]  David Gries,et al.  Is Sometimes Ever Better Than Always? , 1978, TOPL.

[14]  Abraham Silberschatz,et al.  Consistency in Hierarchical Database Systems , 1980, JACM.

[15]  Santosh K. Shrivastava,et al.  A Model of Recoverability in Multilevel Systems , 1978, IEEE Transactions on Software Engineering.

[16]  Gerald Jay Sussman,et al.  Why Conniving is Better than Planning , 1972 .

[17]  David L. Russell Process backup in producer-consumer systems , 1977, SOSP '77.

[18]  Stephen N. Zilles,et al.  Introduction to Data Algebra , 1979, Abstract Software Specifications.

[19]  P. J. Landin,et al.  Correspondence between ALGOL 60 and Church's Lambda-notation , 1965, Commun. ACM.

[20]  C. A. R. HOARE,et al.  An axiomatic basis for computer programming , 1969, CACM.

[21]  Joost Verhofstad The construction of recoverable multi-level systems , 1977 .

[22]  Irving L. Traiger,et al.  Granularity of Locks and Degrees of Consistency in a Shared Data Base , 1998, IFIP Working Conference on Modelling in Data Base Management Systems.

[23]  Irving L. Traiger,et al.  The notions of consistency and predicate locks in a database system , 1976, CACM.

[24]  Brian Randell,et al.  Reliability Issues in Computing System Design , 1978, CSUR.

[25]  C. A. R. Hoare The structure of an operating system , 1975, Language Hierarchies and Interfaces.

[26]  Peter A. Lee A Reconsideration of the Recovery Block Scheme , 1978, Comput. J..

[27]  P. M. Melliar-Smith,et al.  Software reliability: The role of programmed exception handling , 1977, Language Design for Reliable Software.

[28]  Mary Shaw,et al.  An introduction to the construction and verification of Alphard programs , 1976, ICSE '76.

[29]  Charles T. Zahn,et al.  A control statement for natural top-down structured programming , 1974, Symposium on Programming.

[30]  David R. Musser,et al.  An Overview of AFFIRM: A Specification and Verification System , 1980, IFIP Congress.

[31]  Per Brinch Hansen,et al.  The programming language Concurrent Pascal , 1975, IEEE Transactions on Software Engineering.

[32]  C. A. R. Hoare,et al.  Monitors: an operating system structuring concept , 1974, CACM.

[33]  Susan L Gerhart Program Verification in the 1980s: Problems, Perspectives, and Opportunities , 1978 .

[34]  D. B. Lomet Process structuring, synchronization, and recovery using atomic actions , 1977 .

[35]  N. Ghani,et al.  A Recovery Cache for the PDP-11 , 1980, IEEE Transactions on Computers.

[36]  Edsger W. Dijkstra,et al.  The structure of the “THE”-multiprogramming system , 1968, CACM.

[37]  Lawrence Yelowitz,et al.  Observations of Fallibility in Applications of Modern Programming Methodologies , 1976, IEEE Transactions on Software Engineering.

[38]  Cliff B. Jones,et al.  Software development - a rigorous approach , 1980, Prentice Hall international series in computer science.

[39]  W. G. Wood Recovery Control of Communicating Processes in a Distributed System , 1985 .

[40]  David C. Luckham,et al.  Ada exception handling: an axiomatic approach , 1980, TOPL.

[41]  Alan Snyder,et al.  Exception Handling in CLU , 1979, IEEE Transactions on Software Engineering.

[42]  Santosh K. Shrivastava Concurrent Pascal with backward error recovery: language features and examples , 1979 .

[43]  Charles T. Davies,et al.  Recovery semantics for a DB/DC system , 1973, ACM Annual Conference.

[44]  Jim Gray,et al.  Notes on Data Base Operating Systems , 1978, Advanced Course: Operating Systems.

[45]  Hartmann J. Genrich,et al.  A Dictionary of Some Basic Notions of Net Theory , 1979, Advanced Course: Net Theory and Applications.

[46]  Jack B. Dennis,et al.  Programming semantics for multiprogrammed computations , 1966, CACM.

[47]  Joseph A. Goguen,et al.  Abstract Errors for Abstract Data Types , 1977, Formal Description of Programming Concepts.

[48]  Dines Bjørner Formalization of Data Base Models , 1979, Abstract Software Specifications.

[49]  Brian Randell,et al.  System structure for software fault tolerance , 1975, IEEE Transactions on Software Engineering.

[50]  Edwin H. Satterthwaite Debugging tools for high level languages , 1972, Softw. Pract. Exp..

[51]  Mitchell Wand,et al.  A Characterization of Weakest Preconditions , 1977, J. Comput. Syst. Sci..

[52]  Flaviu Cristian Le traitement des exceptions dans les programmes modulaires , 1979 .

[53]  Leslie Lamport,et al.  On-the-fly garbage collection: an exercise in cooperation , 1975, Language Hierarchies and Interfaces.

[54]  Santosh K. Shrivastava,et al.  A Model of Recoverability in Multi-level Systems , 1977 .

[55]  Allan Borodin,et al.  Subrecursive Programming Languages, Part I: efficiency and program structure , 1972, JACM.

[56]  Butler W. Lampson,et al.  Crash Recovery in a Distributed Data Storage System , 1981 .

[57]  Santosh K. Shrivastava Systematic programming of scheduling algorithms , 1976, Softw. Pract. Exp..

[58]  R. Kerr,et al.  Recovery blocks in action: A system supporting high reliability , 1976, ICSE '76.

[59]  John B. Goodenough,et al.  Exception handling: issues and a proposed notation , 1975, CACM.

[60]  Hermann Kopetz,et al.  Software Redundancy in Real Time Systems , 1974, IFIP Congress.

[61]  Brian Randell Reliable Computing Systems , 1978, Advanced Course: Operating Systems.

[62]  Per Brinch Hansen,et al.  Operating System Principles , 1973 .

[63]  Thomas H. Bredt,et al.  Error resynchronization in producer-consumer systems , 1975, SOSP.

[64]  P. M. Melliar-Smith,et al.  A program structure for error detection and recovery , 1974, Symposium on Operating Systems.

[65]  James J. Horning,et al.  Formal specification as a design tool , 1980, POPL '80.

[66]  S. K. Shrivastava,et al.  Sequential pascal with recovery blocks , 1978, Softw. Pract. Exp..