Non-intrusive System Level Fault-Tolerance

High-integrity embedded systems operate in multiple modes, in order to ensure system availability in the face of faults. Unanticipated state-dependent faults that remain in software after system design and development behave like hardware transient faults: they appear, do the damage and disappear. The conventional approach used for handling task overruns caused by transient faults is to use a single recovery task that implements minimal functionality. This approach provides limited availability and should be used as a last resort in order to keep the system online. Traditional fault detection approaches are often intrusive in that they consume processor resources in order to monitor system behavior. This paper presents a novel approach for fault-monitoring by leveraging the Ravenscar profile, model-checking and a system-on-chip implementation of both the kernel and an execution time monitor. System fault-tolerance is provided through a hierarchical set of operational modes that are based on timing behavior violations of individual tasks within the application. The approach is illustrated through a simple case study of a generic navigation system.

[1]  Alfred Strohmeier,et al.  Reliable Software Technologies — Ada-Europe 2003 , 2003, Lecture Notes in Computer Science.

[2]  Joseph Sifakis,et al.  Tools and Applications II: The IF Toolset , 2004 .

[3]  Joseph Y.-T. Leung,et al.  On the complexity of fixed-priority scheduling of periodic, real-time tasks , 1982, Perform. Evaluation.

[4]  C. Siva Ram Murthy,et al.  Resource management in real-time systems and networks , 2001 .

[5]  Alfons Crespo,et al.  Mode Change Protocols for Real-Time Systems: A Survey and a New Proposal , 2004, Real-Time Systems.

[6]  Hermann Kopetz,et al.  Fault tolerance, principles and practice , 1990 .

[7]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[8]  Robin Milner,et al.  On Observing Nondeterminism and Concurrency , 1980, ICALP.

[9]  Kim G. Larsen,et al.  A Tutorial on Uppaal , 2004, SFM.

[10]  Alan Burns,et al.  Guide for the use of the Ada Ravenscar Profile in high integrity systems , 2004, ALET.

[11]  Kim Guldstrand Larsen,et al.  Formal Methods for the Design of Real-Time Systems , 2004, Lecture Notes in Computer Science.

[12]  守屋 悦朗,et al.  J.E.Hopcroft, J.D. Ullman 著, "Introduction to Automata Theory, Languages, and Computation", Addison-Wesley, A5変形版, X+418, \6,670, 1979 , 1980 .

[13]  J. Javier Gutiérrez,et al.  Implementing and Using Execution Time Clocks in Ada Hard Real-Time Applications , 1998, Ada-Europe.

[14]  Chung Laung Liu,et al.  Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment , 1989, JACM.

[15]  Alan Burns,et al.  How to Verify a Safe Real-Time System: The Application of Model Checking and Timed Automata to the Production Cell Case Study* , 2003, Real-Time Systems.

[16]  Alan Burns The Ravenscar Profile , 1999, ALET.

[17]  Torres Wilfredo,et al.  Software Fault Tolerance: A Tutorial , 2000 .

[18]  Juan Antonio de la Puente,et al.  Execution-time clocks and Ravenscar kernels , 2003 .