Implementation of the Conversation Scheme in Message-Based Distributed Computer Systems

Several different approaches for implementing conversations in message-based distributed computer systems (DCSs) are discussed. Two different exit control strategies (synchronous and asynchronous) and three different approaches to execution of the conversation acceptance test (centralized, decentralized, and semicentralized) are examined and compared in terms of system performance and implementation cost. An efficient approach to run-time management of recovery information based on an extension of the recovery cache scheme is also discussed. The two major types of conversation structures, name-linked recovery block and abstract data type conversations, are examined to analyze which execution approaches are the most efficient for each conversation structure. As a case study, an unmanned vehicle system is used to illustrate how the approaches can be used in a realistic real-time application. >

[1]  R. Kerr,et al.  Recovery blocks in action: A system supporting high reliability , 1976, ICSE '76.

[2]  Andrew M. Tyrrell,et al.  Design of reliable software in distributed systems using the conversation scheme , 1986, IEEE Transactions on Software Engineering.

[3]  K.H. Kim,et al.  Testbed-based validation of design techniques for reliable distributed real-time systems , 1987, Proceedings of the IEEE.

[4]  Brian Randell,et al.  Practical Fault Tolerant Software for Asynchronous Systems , 1983 .

[5]  P. M. Melliar-Smith,et al.  A program structure for error detection and recovery , 1974, Symposium on Operating Systems.

[6]  Ehud Gudes,et al.  Software fault tolerance in architectures with hierarchical protection levels , 1988, IEEE Micro.

[7]  K. H. Kim,et al.  Performance Impacts of Look-Ahead Execution in the Conversation Scheme , 1989, IEEE Trans. Computers.

[8]  Wing N. Toy Fault-Tolerant Computing , 1987, Adv. Comput..

[9]  E.B. Fernandez,et al.  A Simplification of a Conversation Design Scheme Using Petri Nets , 1989, IEEE Trans. Software Eng..

[10]  Bharat Bhargava Concurrency control and reliability in distributed systems , 1986 .

[11]  Brian Randell System structure for software fault tolerance , 1975 .

[12]  Santosh K. Shrivastava,et al.  On the Duality of Fault Tolerant System Structures , 1987, Experiences with Distributed Systems.

[13]  Roy H. Campbell,et al.  Path PASCAL user manual , 1980, SIGP.

[14]  Santosh K. Shrivastava,et al.  Replication within atomic actions and conversations: a case study in fault-tolerance duality , 1989, [1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[15]  K. H. Kim,et al.  A distributed fault tolerant architecture for nuclear reactor and other critical process control applications , 1991, [1991] Digest of Papers. Fault-Tolerant Computing: The Twenty-First International Symposium.

[16]  K. H. Kim,et al.  Approaches to Mechanization of the Conversation Scheme Based on Monitors , 1982, IEEE Transactions on Software Engineering.