Fault-Tolerant Rate-Monotonic Scheduling

Due to the critical nature of the tasks in hard real-time systems, it is essential that faults be tolerated. In this paper, we present a scheme which can be used to tolerate faults during the execution of preemptive real-time tasks. We describe a recovery scheme which can be used to re-execute tasks in the event of single and multiple transient faults and discuss conditions that must be met by any such recovery scheme. We then extend the original Rate Monotonic Scheduling (RMS) scheme and the exact characterization of RMS to provide tolerance for single and multiple transient faults. We derive schedulability bounds for sets of real-time tasks given the desired level of fault tolerance for each task or subset of tasks. Finally, we analyze and compare those bounds with existing bounds for non-fault-tolerant and other variations of RMS.

[1]  Brian Randell System structure for software fault tolerance , 1975 .

[2]  Hermann Kopetz,et al.  Tolerating transient faults in MARS , 1990, [1990] Digest of Papers. Fault-Tolerant Computing: 20th International Symposium.

[3]  John Paul Shen,et al.  Processor Control Flow Monitoring Using Signatured Instruction Streams , 1987, IEEE Transactions on Computers.

[4]  Dhiraj K. Pradhan,et al.  Fault-tolerant computing : theory and techniques , 1986 .

[5]  Lui Sha,et al.  Aperiodic task scheduling for Hard-Real-Time systems , 2006, Real-Time Systems.

[6]  C. Krishna,et al.  Reliability of checkpointed real-time systems using time redundancy , 1993 .

[7]  R. H. Campbell,et al.  A fault-tolerant scheduling problem , 1989, IEEE Transactions on Software Engineering.

[8]  Daniel Mosse,et al.  Guaranteeing fault tolerance through scheduling in real-time systems , 1996 .

[9]  Sandra Ramos Thuel,et al.  Enhancing fault tolerance of real-time systems through time redundancy , 1993 .

[10]  G. MacEwen,et al.  Toward Fault-Tolerant Adaptive Real-Time Distributed Systems , 1992 .

[11]  John P. Lehoczky,et al.  The rate monotonic scheduling algorithm: exact characterization and average case behavior , 1989, [1989] Proceedings. Real-Time Systems Symposium.

[12]  Miroslaw Malek,et al.  Minimum Achievable Utilization for Fault-Tolerant Processing of Periodic Tasks , 1998, IEEE Trans. Computers.

[13]  Kang G. Shin,et al.  On Scheduling Tasks with a Quick Recovery from Failure , 1986, IEEE Transactions on Computers.

[14]  Rami G. Melhem,et al.  Enhancing real-time schedules to tolerate transient faults , 1995, Proceedings 16th IEEE Real-Time Systems Symposium.

[15]  Stephen S. Yau,et al.  Concurrent software fault detection , 1975, IEEE Transactions on Software Engineering.

[16]  Daniel P. Siewiorek,et al.  Derivation and Calibration of a Transient Error Reliability Model , 1982, IEEE Transactions on Computers.

[17]  Jay K. Strosnider,et al.  Engineering and analysis of real-time operating systems , 1993 .

[18]  Jiri Gaisler Concurrent error-detection and modular fault-tolerance in a 32-bit processing core for embedded space flight applications , 1994, Proceedings of IEEE 24th International Symposium on Fault- Tolerant Computing.

[19]  Barry W. Johnson Design & analysis of fault tolerant digital systems , 1988 .

[20]  Alan Burns,et al.  Applying new scheduling theory to static priority pre-emptive scheduling , 1993, Softw. Eng. J..

[21]  L. Doyle,et al.  Successful use of rate monotonic theory on a formidable real time system , 1994, Proceedings of 11th IEEE Workshop on Real-Time Operating Systems and Software.

[22]  A. Campbell,et al.  Single event upset rates in space , 1992 .

[23]  D.P. Siewiorek,et al.  A case study of C.mmp, Cm*, and C.vmp: Part I—Experiences with fault tolerance in multiprocessor systems , 1978, Proceedings of the IEEE.

[24]  Alan Burns,et al.  Feasibility analysis of fault-tolerant real-time task sets , 1996, Proceedings of the Eighth Euromicro Workshop on Real-Time Systems.

[25]  Rami G. Melhem,et al.  Analysis of a fault-tolerant multiprocessor scheduling algorithm , 1994, Proceedings of IEEE 24th International Symposium on Fault- Tolerant Computing.

[26]  Jan Torin,et al.  Evaluating processor-behavior and three error-detection mechanisms using physical fault-injection , 1995 .

[27]  C. Douglas Locke,et al.  Building a predictable avionics platform in Ada: a case study , 1991, [1991] Proceedings Twelfth Real-Time Systems Symposium.

[28]  J. D. Dehn,et al.  Rate monotonic analysis of a large, distributed system , 1994, Proceedings of 2nd IEEE Workshop on Real-Time Applications.

[29]  Jay K. Strosnider,et al.  ENHANCED APERIODIC RESPONSIVENESS IN HARD REAL-TIME ENVIRONMENTS. , 1987, RTSS 1987.

[30]  Kevin Driscoll,et al.  ARINC 659 scheduling: problem definition , 1994, 1994 Proceedings Real-Time Systems Symposium.

[31]  Stephen S. Yau,et al.  An Approach to Concurrent Control Flow Checking , 1980, IEEE Transactions on Software Engineering.

[32]  Jennifer Rexford,et al.  Design and evaluation of a window-consistent replication service , 1995, Proceedings Real-Time Technology and Applications Symposium.

[33]  Chung Laung Liu,et al.  Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment , 1989, JACM.

[34]  Jay K. Strosnider,et al.  Scheduling Fault Recovery Operations for Time-Critical Applications , 1995 .

[35]  H. Kopetz,et al.  Automotive electronics: present state and future prospects , 1995 .

[36]  Brian Randell,et al.  System structure for software fault tolerance , 1975, IEEE Transactions on Software Engineering.

[37]  Sang Hyuk Son,et al.  Enhancing fault-tolerance in rate-monotonic scheduling , 1994, Real-Time Systems.

[38]  Ravishankar K. Iyer,et al.  Measurement and modeling of computer reliability as affected by system activity , 1986, TOCS.

[39]  R. Ramaswami,et al.  Book Review: Design and Analysis of Fault-Tolerant Digital Systems , 1990 .

[40]  Rami G. Melhem,et al.  Fault-Tolerance Through Scheduling of Aperiodic Tasks in Hard Real-Time Multiprocessor Systems , 1997, IEEE Trans. Parallel Distributed Syst..

[41]  Yingfeng Oh,et al.  The design and analysis of scheduling algorithms for real-time and fault-tolerant computer systems , 1994 .

[42]  Edward J. McCluskey,et al.  Concurrent Error Detection Using Watchdog Processors - A Survey , 1988, IEEE Trans. Computers.