Fast Algorithms for Testing Fault-Tolerance of Sequenced Jobs with Deadlines

In queue-based scheduling systems jobs are executed according to a predefined sequential plan. During execution, faults may occur that cause jobs to re-execute, thus delaying the whole schedule. It is thus important to determine (in real-time) whether the given set of pre-ordered jobs is fault-tolerant, that is, if all jobs will always meet their deadlines. This allows, for instance, to decide online whether to admit a new urgent job into the queue while still guaranteeing that the whole schedule remains fault-tolerant. Our goal in this work is to design efficient algorithm for testing fault tolerance of sequenced jobs in the presence of transient faults. We consider different fault models that specify which fault patterns are allowed to occur and how soon failed jobs can be restarted. For each fault model we provide efficient algorithms that determine the feasibility of all jobs in the schedule. Our algorithms are exact and run in time linear in the number of jobs (deterministically, or with very high probability, depending on the fault model), and thus can be used to make real-time decisions.

[1]  Alan Burns,et al.  Resource Sharing in Hierarchical Fixed Priority Pre-Emptive Systems , 2006, 2006 27th IEEE International Real-Time Systems Symposium (RTSS'06).

[2]  Rami G. Melhem,et al.  Tolerance to Multiple Transient Faults for Aperiodic Tasks in Hard Real-Time Systems , 2000, IEEE Trans. Computers.

[3]  Marco Caccamo,et al.  Aperiodic servers with resource constraints , 2001, Proceedings 22nd IEEE Real-Time Systems Symposium (RTSS 2001) (Cat. No.01PR1420).

[4]  Alan Burns,et al.  Multiple Servers and Capacity Sharing for Implementing Flexible Scheduling , 2004, Real-Time Systems.

[5]  Alan Burns,et al.  Hierarchical fixed priority pre-emptive scheduling , 2005, 26th IEEE International Real-Time Systems Symposium (RTSS'05).

[6]  Rami Melhem,et al.  Fault-tolerant RT-Mach (FT-RT-Mach) and an application to real-time train control , 1999 .

[7]  Sanjoy K. Baruah,et al.  Preemptively scheduling hard-real-time sporadic tasks on one processor , 1990, [1990] Proceedings 11th Real-Time Systems Symposium.

[8]  Chung Laung Liu,et al.  Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment , 1989, JACM.

[9]  Rami G. Melhem,et al.  Fault tolerant real-time global scheduling on multiprocessors , 1999, Proceedings of 11th Euromicro Conference on Real-Time Systems. Euromicro RTS'99.

[10]  Giorgio C. Buttazzo,et al.  Integrating multimedia applications in hard real-time systems , 1998, Proceedings 19th IEEE Real-Time Systems Symposium (Cat. No.98CB36279).

[11]  Rami G. Melhem,et al.  A Nonpreemptive Real-Time Scheduler with Recovery from Transient Faults and Its Implementation , 2003, IEEE Trans. Software Eng..

[12]  Jane W.-S. Liu,et al.  Scheduling real-time applications in an open environment , 1997, Proceedings Real-Time Systems Symposium.

[13]  Insup Lee,et al.  Compositional real-time scheduling framework , 2004, 25th IEEE International Real-Time Systems Symposium.

[14]  Aloysius K. Mok,et al.  A model of hierarchical real-time virtual resources , 2002, 23rd IEEE Real-Time Systems Symposium, 2002. RTSS 2002..

[15]  C. Siva Ram Murthy,et al.  A Fault-Tolerant Dynamic Scheduling Algorithm for Multiprocessor Real-Time Systems and Its Analysis , 1998, IEEE Trans. Parallel Distributed Syst..

[16]  Xiao Qin,et al.  Real-time Fault-tolerant Scheduling in Heterogeneous Distributed Systems , 2000 .

[17]  Hakan Aydin,et al.  On fault-sensitive feasibility analysis of real-time task sets , 2004, 25th IEEE International Real-Time Systems Symposium.

[18]  Aloysius K. Mok,et al.  Resource partition for real-time systems , 2001, Proceedings Seventh IEEE Real-Time Technology and Applications Symposium.

[19]  Lui Sha,et al.  Priority Inheritance Protocols: An Approach to Real-Time Synchronization , 1990, IEEE Trans. Computers.

[20]  Giuseppe Lipari,et al.  Feasibility Analysis of Real-Time Periodic Tasks with Offsets , 2005, Real-Time Systems.

[21]  Giuseppe Lipari,et al.  Schedulability analysis of periodic and aperiodic tasks with resource constraints , 2000, J. Syst. Archit..

[22]  Sanjoy K. Baruah,et al.  Resource-Locking Durations in EDF-Scheduled Systems , 2007, 13th IEEE Real Time and Embedded Technology and Applications Symposium (RTAS'07).

[23]  Alan Burns,et al.  FSF: A Real-Time Scheduling Architecture Framework , 2006, 12th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS'06).

[24]  Rami G. Melhem,et al.  Enhancing real-time schedules to tolerate transient faults , 1995, Proceedings 16th IEEE Real-Time Systems Symposium.

[25]  Bala Kalyanasundaram,et al.  Fault-Tolerant Real-Time Scheduling , 2000, Algorithmica.

[26]  Giuseppe Lipari,et al.  Resource partitioning among real-time applications , 2003, 15th Euromicro Conference on Real-Time Systems, 2003. Proceedings..

[27]  Rami G. Melhem,et al.  Fault-Tolerance Through Scheduling of Aperiodic Tasks in Hard Real-Time Multiprocessor Systems , 1997, IEEE Trans. Parallel Distributed Syst..

[28]  Xiao Qin,et al.  An efficient fault-tolerant scheduling algorithm for real-time tasks with precedence constraints in heterogeneous systems , 2002, Proceedings International Conference on Parallel Processing.

[29]  Sanjoy K. Baruah,et al.  Resource Sharing in EDF-Scheduled Systems: A Closer Look , 2006, 2006 27th IEEE International Real-Time Systems Symposium (RTSS'06).

[30]  Ragunathan Rajkumar,et al.  Synchronization in Real-Time Systems , 1991 .

[31]  Y. Sorel,et al.  A scheduling heuristics for distributed real-time embedded systems tolerant to processor and communication media failures , 2004 .