Fault-tolerant real-time objects

time (RT) computer systems in safety-critical application fields and the reliability expectations of the user community for such computer systems have been growing fast in recent years. The reliability goals that customers have started imposing on such systems cannot be adequately met by relying on the conventional design technologies, which tend to be weakly scaleable and do not have enough in common with the general object-oriented (OO) design technology increasingly applied to the production of large-scale non-RT business and office data processing software. In order to achieve significant improvement in design efficiency and system reliability attained in the real-time computing application fields, we believe that it will be most rewarding and timely to establish the following types of technologies: • General-form design style: Real-time computing must be realized in the form of a generalization of non-real-time computing, as opposed to the form looking like an eso-teric specialization [6]. • Design-time guarantee of timely service capabilities of subsystems: To meet the demands of the general public on the assured reliability of future real-time computing systems (RTCSs) in safety-critical applications, the system engineer must produce design-time guarantees for timely service capabilities of various subsystems (which will take the form of objects in object-oriented system designs) as opposed to relying on the testing only. • Scalable time-bounded fault tolerance scheme: Ideally, fault detection and recovery actions must always be executed such that intended output actions of real-time computations take place on time. Such an idealistic type of fault tolerance, which is to accomplish all critical actions (i.e., output actions of critical real-time tasks) successfully in spite of component failures, is called the action-level fault tolerance (ALFT) [5]. The timing properties of a scheme for achieving ALFT should be easily analyzable and in particular, such a scheme must yield a small recovery time-bound for non-negligible types of fault scenarios. Desirable fault tolerance techniques must be scalable in that they must be applicable to various distributed and/or parallel computer systems of different sizes. There may be many approaches to realizing each of these three goals. However, there are few concrete demonstrated approaches. Moreover, what we ultimately need is an integrated design technology that meets all three goals mentioned previously. Establishing such a technology is among the most challenging open research issues in the area of reliable real-time computing. In this article, we state some major issues and establish some feasible directions to search for such …

[1]  Ammar Attoui,et al.  An object oriented model for parallel and reactive systems , 1991, [1991] Proceedings Twelfth Real-Time Systems Symposium.

[2]  K. H. Kim,et al.  A timeliness-guaranteed kernel model-DREAM kernel-and implementation techniques , 1995, Proceedings Second International Workshop on Real-Time Computing Systems and Applications.

[3]  Hideyuki Tokuda,et al.  An object-oriented real-time programming language , 1992, Computer.

[4]  K. H. Kim,et al.  A real-time object model RTO.k and an experimental investigation of its potentials , 1994, Proceedings Eighteenth Annual International Computer Software and Applications Conference (COMPSAC 94).

[5]  Steven Howell,et al.  Distinguishing features and potential roles of the RTO.k object model , 1994, Proceedings of Words '94. The First Workshop on Object-Oriented Real-Time Dependable Systems.

[6]  Farokh B. Bastani,et al.  Toward dependable safety-critical software , 1996, Proceedings of WORDS'96. The Second Workshop on Object-Oriented Real-Time Dependable Systems.

[7]  K. H. Kim,et al.  Action-level fault tolerance , 1995 .

[8]  K. H. Kim,et al.  The DREAM library support for PCD and RTO.k programming in C++ , 1996, Proceedings of WORDS'96. The Second Workshop on Object-Oriented Real-Time Dependable Systems.

[9]  Hermann Kopetz,et al.  TTP - A time-triggered protocol for fault-tolerant real-time systems , 1993, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.

[10]  Algirdas Avizienis,et al.  Software Fault Tolerance , 1989, IFIP Congress.

[11]  Brian Randell,et al.  The Evolution of the Recovery Block Concept , 1994 .

[12]  K. H. Kim,et al.  A distributed fault tolerant architecture for nuclear reactor and other critical process control applications , 1991, [1991] Digest of Papers. Fault-Tolerant Computing: The Twenty-First International Symposium.