Tolerance to Multiple Transient Faults for Aperiodic Tasks in Hard Real-Time Systems

Real-time systems are being increasingly used in several applications which are time-critical in nature. Fault tolerance is an essential requirement of such systems, due to the catastrophic consequences of not tolerating faults. In this paper, we study a scheme that guarantees the timely recovery from multiple faults within hard real-time constraints in uniprocessor systems. Assuming earliest-deadline-first scheduling (EDF) for aperiodic preemptive tasks, we develop a necessary and sufficient feasibility-check algorithm for fault-tolerant scheduling with complexity O(n/sup 2/-/spl kappa/), where n is the number of tasks to be scheduled and /spl kappa/ is the maximum number of faults to be tolerated.

[1]  P. M. Melliar-Smith,et al.  A program structure for error detection and recovery , 1974, Symposium on Operating Systems.

[2]  Daniel P. Siewiorek,et al.  Derivation and Calibration of a Transient Error Reliability Model , 1982, IEEE Transactions on Computers.

[3]  Jacob A. Abraham,et al.  Algorithm-Based Fault Tolerance for Matrix Operations , 1984, IEEE Transactions on Computers.

[4]  R. H. Campbell,et al.  A fault-tolerant scheduling problem , 1989, IEEE Transactions on Software Engineering.

[5]  Ravishankar K. Iyer,et al.  A Measurement-Based Model for Workload Dependence of CPU Errors , 1986, IEEE Transactions on Computers.

[6]  Ravishankar K. Iyer,et al.  Measurement and modeling of computer reliability as affected by system activity , 1986, TOCS.

[7]  Dhiraj K. Pradhan,et al.  Fault-tolerant computing : theory and techniques , 1986 .

[8]  Kang G. Shin,et al.  On Scheduling Tasks with a Quick Recovery from Failure , 1986, IEEE Transactions on Computers.

[9]  Lalit M. Patnaik,et al.  Workload redistribution for fault-tolerance in a hard real-time distributed computing system , 1989, [1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[10]  Sang Hyuk Son,et al.  An algorithm for real-time fault-tolerant scheduling in multiprocessor systems , 1992, Fourth Euromicro workshop on Real-Time Systems.

[11]  Tse-Yun Feng,et al.  Algorithm-Based Fault Tolerance for Matrix Inversion with Maximum Pivoting , 1992, J. Parallel Distributed Comput..

[12]  A. Campbell,et al.  Single event upset rates in space , 1992 .

[13]  Sandra Ramos Thuel,et al.  Enhancing fault tolerance of real-time systems through time redundancy , 1993 .

[14]  James L. Crowley,et al.  Layered Control of a Binocular Camera Head , 1993, Int. J. Pattern Recognit. Artif. Intell..

[15]  Rami G. Melhem,et al.  Fault-tolerant scheduling on a hard real-time multiprocessor system , 1994, Proceedings of 8th International Parallel Processing Symposium.

[16]  J. H. Lala,et al.  Architectural principles for safety-critical real-time applications , 1994, Proc. IEEE.

[17]  Jay K. Strosnider,et al.  Scheduling Fault Recovery Operations for Time-Critical Applications , 1995 .

[18]  R. Pandya,et al.  Emerging mobile and personal communication systems , 1995, IEEE Commun. Mag..

[19]  L. Mezzalira Real-time systems , 1996, J. Syst. Archit..

[20]  B. Girod,et al.  A new technique for audio packet loss concealment , 1996, Proceedings of GLOBECOM'96. 1996 IEEE Global Telecommunications Conference.

[21]  Hermann Kopetz,et al.  Real-time systems , 2018, CSC '73.

[22]  Rami G. Melhem,et al.  Fault-Tolerance Through Scheduling of Aperiodic Tasks in Hard Real-Time Multiprocessor Systems , 1997, IEEE Trans. Parallel Distributed Syst..

[23]  Miroslaw Malek,et al.  Minimum Achievable Utilization for Fault-Tolerant Processing of Periodic Tasks , 1998, IEEE Trans. Computers.

[24]  Jonathan D. Rosenberg,et al.  The Session Initiation Protocol: Providing advanced telephony services across the Internet , 1998, Bell Labs Technical Journal.

[25]  Rami G. Melhem,et al.  An efficient RMS admission control and its application to multiprocessor scheduling , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.

[26]  András Gergely Valkó,et al.  Cellular IP: a new approach to Internet host mobility , 1999, CCRV.

[27]  Henning Schulzrinne,et al.  An RTP Payload Format for Generic Forward Error Correction , 1999, RFC.

[28]  Rami G. Melhem,et al.  Fault tolerant real-time global scheduling on multiprocessors , 1999, Proceedings of 11th Euromicro Conference on Real-Time Systems. Euromicro RTS'99.

[29]  Luigi V. Mancini,et al.  Fault-Tolerant Rate-Monotonic First-Fit Scheduling in Hard-Real-Time Systems , 1999, IEEE Trans. Parallel Distributed Syst..

[30]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.