The PSTR/SNS Scheme for Real-Time Fault Tolerance via Active Object Replication and Network Surveillance

The TMO (Time-triggered Message-triggered Object) scheme was formulated as a major extension of the conventional object structuring schemes with the idealistic goal of facilitating general-form design and timeliness-guaranteed design of complex real-time application systems. Recently, as a new scheme for realizing TMO-structured distributed and parallel computer systems that are capable of both hardware and software fault tolerance, we have formulated and demonstrated the PSTR (Primary-Shadow TMO Replication) scheme. An important new extension of the PSTR scheme discussed in this paper is an integration of the PSTR scheme and a network surveillance (NS) scheme. This extension results in a significant improvement in the fault coverage and recovery time bound achieved. The NS scheme adopted is a recently-developed scheme that is effective in a wide range of point-to-point networks, and it is called the SNS (Supervisor-based Network Surveillance) scheme. The integration of the PSTR scheme and the SNS scheme is called the PSTR/SNS scheme. The recovery time bound of the PSTR/SNS scheme is analyzed on the basis of an implementation model that can be easily adapted to various commercial operating system kernels.

[1]  K. H. Kim,et al.  Action-level fault tolerance , 1995 .

[2]  Farokh B. Bastani,et al.  Toward dependable safety-critical software , 1996, Proceedings of WORDS'96. The Second Workshop on Object-Oriented Real-Time Dependable Systems.

[3]  K. H. Kim Object Structures for Real-Time Systems and Simulators , 1997, Computer.

[4]  K. H. Kim,et al.  A supervisor-based semi-centralized network surveillance scheme and the fault detection latency bound , 1997, Proceedings of SRDS'97: 16th IEEE Symposium on Reliable Distributed Systems.

[5]  Hideyuki Tokuda,et al.  An object-oriented real-time programming language , 1992, Computer.

[6]  Michael R. Lyu,et al.  System-Level Reliability and Sensitivity Analyses for Three Fault-Tolerant System Architectures , 1995 .

[7]  Krithi Ramamritham,et al.  Advances in Real-Time Systems , 1993 .

[8]  K. H. Kim,et al.  A distributed fault tolerant architecture for nuclear reactor and other critical process control applications , 1991, [1991] Digest of Papers. Fault-Tolerant Computing: The Twenty-First International Symposium.

[9]  K. H. Kim,et al.  A timeliness-guaranteed kernel model-DREAM kernel-and implementation techniques , 1995, Proceedings Second International Workshop on Real-Time Computing Systems and Applications.

[10]  Edgar Nett,et al.  Supporting fault-tolerant distributed computations under real-time requirements , 1992, Comput. Commun..

[11]  Brian Randell,et al.  The Evolution of the Recovery Block Concept , 1994 .

[12]  Steven Howell,et al.  Distinguishing features and potential roles of the RTO.k object model , 1994, Proceedings of Words '94. The First Workshop on Object-Oriented Real-Time Dependable Systems.

[13]  K. H. Kim,et al.  Fault-tolerant real-time objects , 1997, CACM.

[14]  Brian Randell System structure for software fault tolerance , 1975 .

[15]  K. H. Kim,et al.  Dynamic Configuration Management in Reliable Distributed Real-Time Information Systems , 1999, IEEE Trans. Knowl. Data Eng..

[16]  Priya Narasimhan,et al.  Commentary: Object-Oriented Programming of Complex fault-Tolerant Real-Time Systems , 1996 .

[17]  Hermann Kopetz,et al.  TTP - A time-triggered protocol for fault-tolerant real-time systems , 1993, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.

[18]  Ammar Attoui,et al.  An object oriented model for parallel and reactive systems , 1991, [1991] Proceedings Twelfth Real-Time Systems Symposium.

[19]  Brian Randell,et al.  System structure for software fault tolerance , 1975, IEEE Transactions on Software Engineering.

[20]  Priya Narasimhan,et al.  Object-oriented programming of complex fault-tolerant real-time systems , 1996, Proceedings of WORDS'96. The Second Workshop on Object-Oriented Real-Time Dependable Systems.