Fast Algorithms for Testing Fault-Tolerance of Sequenced Jobs with Deadlines

In queue-based scheduling systems jobs are executed according to a predefined sequential plan. During execution, faults may occur that cause jobs to re-execute, thus delaying the whole schedule. It is thus important to determine (in real-time) whether the given set of pre-ordered jobs is fault-tolerant, that is, if all jobs will always meet their deadlines. This allows, for instance, to decide online whether to admit a new urgent job into the queue while still guaranteeing that the whole schedule remains fault-tolerant. Our goal in this work is to design efficient algorithm for testing fault tolerance of sequenced jobs in the presence of transient faults. We consider different fault models that specify which fault patterns are allowed to occur and how soon failed jobs can be restarted. For each fault model we provide efficient algorithms that determine the feasibility of all jobs in the schedule. Our algorithms are exact and run in time linear in the number of jobs (deterministically, or with very high probability, depending on the fault model), and thus can be used to make real-time decisions.

[1]  Rami G. Melhem,et al.  Enhancing real-time schedules to tolerate transient faults , 1995, Proceedings 16th IEEE Real-Time Systems Symposium.

[2]  Rami G. Melhem,et al.  Tolerance to Multiple Transient Faults for Aperiodic Tasks in Hard Real-Time Systems , 2000, IEEE Trans. Computers.

[3]  Xiao Qin,et al.  An efficient fault-tolerant scheduling algorithm for real-time tasks with precedence constraints in heterogeneous systems , 2002, Proceedings International Conference on Parallel Processing.

[4]  Bala Kalyanasundaram,et al.  Fault-Tolerant Real-Time Scheduling , 2000, Algorithmica.

[5]  Rami G. Melhem,et al.  Fault tolerant real-time global scheduling on multiprocessors , 1999, Proceedings of 11th Euromicro Conference on Real-Time Systems. Euromicro RTS'99.

[6]  Hakan Aydin,et al.  On fault-sensitive feasibility analysis of real-time task sets , 2004, 25th IEEE International Real-Time Systems Symposium.

[7]  Y. Sorel,et al.  A scheduling heuristics for distributed real-time embedded systems tolerant to processor and communication media failures , 2004 .

[8]  Xiao Qin,et al.  Real-time Fault-tolerant Scheduling in Heterogeneous Distributed Systems , 2000 .

[9]  Rami G. Melhem,et al.  Fault-Tolerance Through Scheduling of Aperiodic Tasks in Hard Real-Time Multiprocessor Systems , 1997, IEEE Trans. Parallel Distributed Syst..

[10]  C. Siva Ram Murthy,et al.  An Efficient Dynamic Scheduling Algorithm For Multiprocessor Real-Time Systems , 1998, IEEE Trans. Parallel Distributed Syst..

[11]  Rami G. Melhem,et al.  A Nonpreemptive Real-Time Scheduler with Recovery from Transient Faults and Its Implementation , 2003, IEEE Trans. Software Eng..

[12]  Rami G. Melhem,et al.  Fault-Tolerant RT-Mach (FT-RT-Mach) and an Application to Real-Time Train Control , 1999, Softw. Pract. Exp..