An adaptive scheme for fault-tolerant scheduling of soft real-time tasks in multiprocessor systems

The scheduling of real-time tasks with primary-backup-based fault-tolerant requirements has been an important problem for several years. Most of the known scheduling schemes are non-adaptive in nature meaning that they do not adapt to the dynamics of faults and task's parameters in the system. In this paper, we propose an adaptive fault-tolerant scheduling scheme that has a mechanism to control the overlap interval between the primary and backup versions of tasks such that the overall performance of the system is improved. The overlap interval is determined based on the observed fault rate and task's soft laxity. We also propose a new performance index, called SR index, that integrates schedulability (S) and reliability (R) into a single metric. To evaluate the proposed scheme, we have conducted analytical and simulation studies under different fault and deadline scenarios, and found that the proposed adaptive scheme adapts to system dynamics and offers better SR index than that of the non-adaptive schemes.

[1]  Brian Randell System structure for software fault tolerance , 1975 .

[2]  Krithi Ramamritham,et al.  The Spring kernel: a new paradigm for real-time operating systems , 1989, OPSR.

[3]  Krithi Ramamritham,et al.  Adaptive fault tolerance and graceful degradation under dynamic hard real-time scheduling , 1997, Proceedings Real-Time Systems Symposium.

[4]  Parameswaran Ramanathan,et al.  Real-time computing: a new discipline of computer science and engineering , 1994, Proc. IEEE.

[5]  Dimiter R. Avresky,et al.  Dependable Network Computing , 1999 .

[6]  Arun K. Somani,et al.  Low Overhead Multiprocessor Allocation Strategies Exploiting System Space Capacity for Fault Detection and Location , 1995, IEEE Trans. Computers.

[7]  Krithi Ramamritham,et al.  Determining Redundancy Levels for Fault Tolerant Real-Time Systems , 1995, IEEE Trans. Computers.

[8]  John A. Stankovic,et al.  Real-time computing , 1992 .

[9]  Liming Chen,et al.  N-VERSION PROGRAMMINC: A FAULT-TOLERANCE APPROACH TO RELlABlLlTY OF SOFTWARE OPERATlON , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing, 1995, ' Highlights from Twenty-Five Years'..

[10]  Rami G. Melhem,et al.  Fault-Tolerance Through Scheduling of Aperiodic Tasks in Hard Real-Time Multiprocessor Systems , 1997, IEEE Trans. Parallel Distributed Syst..

[11]  Giorgio C. Buttazzo,et al.  Optimal scheduling for fault-tolerant and firm real-time systems , 1998, Proceedings Fifth International Conference on Real-Time Computing Systems and Applications (Cat. No.98EX236).

[12]  RamamrithamKrithi,et al.  The Spring Kernel , 1991 .

[13]  Sang Hyuk Son,et al.  Performance specifications and metrics for adaptive real-time systems , 2000, Proceedings 21st IEEE Real-Time Systems Symposium.

[14]  Kang G. Shin,et al.  On Scheduling Tasks with a Quick Recovery from Failure , 1986, IEEE Transactions on Computers.

[15]  G. Manimaran,et al.  An Adaptive Scheme for Fault-Tolerant Scheduling of Soft Real-Time Tasks in Multiprocessor Systems , 2001, HiPC.

[16]  C. Siva Ram Murthy,et al.  A Fault-Tolerant Dynamic Scheduling Algorithm for Multiprocessor Real-Time Systems and Its Analysis , 1998, IEEE Trans. Parallel Distributed Syst..

[17]  Brian Randell,et al.  System structure for software fault tolerance , 1975, IEEE Transactions on Software Engineering.