论文信息 - From Recovery Blocks to Concurrent Atomic Actions

From Recovery Blocks to Concurrent Atomic Actions

This paper reviews the development of error recovery structures that support general fault tolerance, and describes a new object-oriented scheme for error recovery in concurrent systems that generalizes existing schemes based on either conversations or transactions. This new scheme, which is based on what we term a Coordinated Atomic Action, is intended to facilitate the provision of means of tolerating hardware and software faults, and faults that have affected the environment of the computer system — and to do so for programs that involve cooperating concurrent processes, and the use of shared resources.

[1] K. H. Kim,et al. A distributed fault tolerant architecture for nuclear reactor and other critical process control applications , 1991, [1991] Digest of Papers. Fault-Tolerant Computing: The Twenty-First International Symposium.

[2] K. H. Kim,et al. Approaches to Mechanization of the Conversation Scheme Based on Monitors , 1982, IEEE Transactions on Software Engineering.

[3] Brian Randell,et al. Object-Oriented Software Fault Tolerance: Framework, reuse and design diversity , 1993 .

[4] Hermann Kopetz,et al. Fault tolerance, principles and practice , 1990 .

[5] K. H. Kim,et al. Distributed Execution of Recovery Blocks: An Approach to Uniform Treatment of Hardware and Software Faults , 1984, IEEE International Conference on Distributed Computing Systems.

[6] Brian Randell. Fault Tolerance and System Structuring , 1984 .

[7] Santosh K. Shrivastava,et al. An overview of the Arjuna distributed programming system , 1991, IEEE Software.

[8] Cecília M. F. Rubira,et al. Fault tolerance in concurrent object-oriented software through coordinated error recovery , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[9] Brian Randell. System structure for software fault tolerance , 1975 .

[10] John C. Knight,et al. A Framework for Software Fault Tolerance in Real-Time Systems , 1983, IEEE Transactions on Software Engineering.

[11] Brian Randell,et al. The Evolution of the Recovery Block Concept , 1994 .

[12] D. B. Lomet. Process structuring, synchronization, and recovery using atomic actions , 1977 .

[13] David F. McAllister,et al. The consensus recovery block , 1983 .

[14] Gerald M. Masson,et al. Using certification trails to achieve software fault tolerance , 1990, [1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium.

[15] Parameswaran Ramanathan,et al. Checkpointing and rollback recovery in a distributed system using common time base , 1988, Proceedings [1988] Seventh Symposium on Reliable Distributed Systems.

[16] Michel Banâtre,et al. The Concept of Multi-function: A General Structuring Tool for Distributed Operating System , 1986, ICDCS.

[17] Peter A. Barrett,et al. Software Fault Tolerance: An Evaluation , 1985, IEEE Transactions on Software Engineering.

[18] K. H. Kim,et al. Distributed Execution of Recovery Blocks: An Approach for Uniform Treatment of Hardware and Software Faults in Real-Time Applications , 1989, IEEE Trans. Computers.

[19] Flaviu Cristian,et al. Exception Handling and Software Fault Tolerance , 1982, IEEE Transactions on Computers.

[20] William E. Weihl,et al. Implementation of resilient, atomic data types , 1985, TOPL.

[21] H. Hecht,et al. Fault-Tolerant Software for Real-Time Applications , 1976, CSUR.

[22] Barbara Liskov,et al. Distributed programming in Argus , 1988, CACM.

[23] Santosh K. Shrivastava,et al. Reliable Resource Allocation Betvveen Unreliable Processes , 1978, IEEE Transactions on Software Engineering.

[24] Peter A. Lee. A Reconsideration of the Recovery Block Scheme , 1978, Comput. J..

[25] Kang G. Shin,et al. Evaluation of Error Recovery Blocks Used for Cooperating Processes , 1984, IEEE Transactions on Software Engineering.

[26] Maurice Herlihy,et al. Apologizing versus asking permission: optimistic concurrency control for abstract data types , 1990, TODS.

[27] Paul Ammann,et al. Data Diversity: An Approach to Software Fault Tolerance , 1988, IEEE Trans. Computers.

[28] K.H. Kim,et al. A highly decentralized implementation model for the programmer-transparent coordination (PTC) scheme for cooperative recovery , 1990, [1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium.

[29] B. Randell,et al. STATE RESTORATION IN DISTRIBUTED SYSTEMS , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing, 1995, ' Highlights from Twenty-Five Years'..

[30] Brian Randell,et al. Error recovery in asynchronous systems , 1986, IEEE Transactions on Software Engineering.

[31] RICHARD KOO,et al. Checkpointing and Rollback-Recovery for Distributed Systems , 1986, IEEE Transactions on Software Engineering.

[32] David L. Russell,et al. State Restoration in Systems of Communicating Processes , 1980, IEEE Transactions on Software Engineering.

[33] Andrea Bondavalli,et al. A Cost-Effective and Flexible Scheme for Software fault Tolerance , 1993 .

[34] P. M. Melliar-Smith,et al. A program structure for error detection and recovery , 1974, Symposium on Operating Systems.

[35] Kwang-Hae Kim,et al. Approaches to implementation of a repairable distributed recovery block scheme , 1988, [1988] The Eighteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[36] Roy H. Campbell,et al. Atomic actions for fault-tolerance using CSP , 1986, IEEE Transactions on Software Engineering.