Analysis of a fault-tolerant multiprocessor scheduling algorithm

Fault tolerance is an important aspect of real-time computer systems, since timing constraints must not be violated. When dealing with multiprocessor systems, fault tolerance becomes an even greater requirement, since there are more components that can fail. We present the analysis of a fault tolerant scheduling algorithm for real-time applications on multiprocessors. Our algorithm is based on the principles of primary/backup tasks, backup overloading (i.e., scheduling more than a single backup in the same time interval), and backup deallocation (i.e., reclaiming the resources unused by backup tasks in case of fault-free operation). A theoretical model is developed to study a particular class of applications and certain backup and overloading strategies. The proposed scheme can tolerate a single fault of any processor at any time, be it transient or permanent. Simulation results offer evidence of little loss of schedulability due to the addition of the fault tolerance capability. Simulation is also used to study the length of time needed for the system to recover from a fault (i.e., the time when the system is again able to tolerate any fault).<<ETX>>

[1]  David Lorge Parnas,et al.  Scheduling Processes with Release Times, Deadlines, Precedence, and Exclusion Relations , 1990, IEEE Trans. Software Eng..

[2]  James W. Layland,et al.  Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment , 1989, JACM.

[3]  Kang G. Shin,et al.  On Scheduling Tasks with a Quick Recovery from Failure , 1986, IEEE Transactions on Computers.

[4]  Karsten Schwan,et al.  Experimental Evaluation of a Real-Time Scheduler for a Multiprocessor System , 1991, IEEE Trans. Software Eng..

[5]  Dhiraj K. Pradhan,et al.  Fault-tolerant computing : theory and techniques , 1986 .

[6]  P. H. Watson,et al.  Real-time system scenarios , 1990, [1990] Proceedings 11th Real-Time Systems Symposium.

[7]  R. H. Campbell,et al.  A fault-tolerant scheduling problem , 1989, IEEE Transactions on Software Engineering.

[8]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[9]  XuJia,et al.  Scheduling Processes with Release Times, Deadlines, Precedence and Exclusion Relations , 1990 .

[10]  Rami G. Melhem,et al.  Fault-tolerant scheduling on a hard real-time multiprocessor system , 1994, Proceedings of 8th International Parallel Processing Symposium.

[11]  Chung Laung Liu,et al.  Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment , 1989, JACM.

[12]  Lalit M. Patnaik,et al.  Workload redistribution for fault-tolerance in a hard real-time distributed computing system , 1989, [1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[13]  Anand Srinivasan,et al.  Fault-tolerant real-time multiprocessor scheduling , 1996 .