Structuring Distributed Systems for Recoverability and Crash Resistance

An object-oriented multilevel model of computation is used to discuss recoverability and crash resistance issues in distributed systems. Of particular importance are the issues that are raised when recoverability and crash resistance properties are desired from objects whose concrete representations are distributed over several nodes. The execution of a program at a node of the system can give rise to a hierarchy of processes executing various parts of the program at different nodes. Recoverability and crash resistance properties are needed to ensure that such a group of processes leave the system state consistent despite faults in the system.

[1]  Santosh K. Shrivastava,et al.  Concurrent Pascal with Backward Error-recovery , 1978 .

[2]  P. M. Melliar-Smith,et al.  A program structure for error detection and recovery , 1974, Symposium on Operating Systems.

[3]  Lawrence A. Bjork Recovery scenario for a DB/DC system , 1973, ACM Annual Conference.

[4]  Charles T. Davies,et al.  Recovery semantics for a DB/DC system , 1973, ACM Annual Conference.

[5]  Jim Gray,et al.  Notes on Data Base Operating Systems , 1978, Advanced Course: Operating Systems.

[6]  Butler W. Lampson,et al.  Crash Recovery in a Distributed Data Storage System , 1981 .

[7]  Santosh K. Shrivastava,et al.  A Model of Recoverability in Multilevel Systems , 1978, IEEE Transactions on Software Engineering.

[8]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[9]  Santosh K. Shrivastava,et al.  Reliable Resource Allocation Betvveen Unreliable Processes , 1978, IEEE Transactions on Software Engineering.

[10]  Charles T. Davies,et al.  Data Processing Spheres of Control , 1978, IBM Syst. J..

[11]  Irving L. Traiger,et al.  The notions of consistency and predicate locks in a database system , 1976, CACM.

[12]  Flaviu Cristian,et al.  Exception Handling and Software Fault Tolerance , 1982, IEEE Transactions on Computers.

[13]  Brian Randell,et al.  System structure for software fault tolerance , 1975, IEEE Transactions on Software Engineering.

[14]  Santosh K. Shrivastava Concurrent Pascal with backward error recovery: language features and examples , 1979 .

[15]  Brian Randell,et al.  Reliability Issues in Computing System Design , 1978, CSUR.