Optimizing resources in real-time scheduling for fault tolerant processors

The safety critical systems used in avionics, nuclear power plants and emergency medical equipments have to meet stringent reliability and temporal demands. Such demands are met with fault tolerant mechanisms, such as hardware and software redundancy. In this paper, we consider a safety critical application, the dual redundant onboard computer (OBC) system of the Indian Satellite Launch Vehicle and propose a scheme to optimize the onboard computing resources without detracting from the system reliability requirements. The redundancy is dealt with at the task allocation level and the slack generated, is used for allocation of more computational tasks, making the scheme very attractive in terms of efficient management of resources. The scheme of task allocation combined with real-time scheduling using Rate Monotonic (RM) and Earliest Deadline First (EDF) provide more programming flexibility and efficiently utilize the system resources. The scheme when implemented gives an efficient offline task allocation for fault-free conditions and flexible fault tolerance strategy during processor failure. The proposed scheme is compared with a traditional dual scheme. The implementation is experimented with a simulation and evaluated using performance metrics to illustrate the enhanced performance capability of the approach. This scheme, extended to multiprocessors with generic features can lead to tremendous throughput in terms of performance and costs. The contributions of this work are a system level algorithm for the implementation of real-time task allocation and scheduling.

[1]  C. Siva Ram Murthy,et al.  An Efficient Dynamic Scheduling Algorithm For Multiprocessor Real-Time Systems , 1998, IEEE Trans. Parallel Distributed Syst..

[2]  Alan Burns,et al.  An effective schedulability analysis for fault-tolerant hard real-time systems , 2001, Proceedings 13th Euromicro Conference on Real-Time Systems.

[3]  Brian Randell,et al.  Fundamental Concepts of Dependability , 2000 .

[4]  Brian Randell,et al.  System structure for software fault tolerance , 1975, IEEE Transactions on Software Engineering.

[5]  Kang G. Shin,et al.  On Scheduling Tasks with a Quick Recovery from Failure , 1986, IEEE Transactions on Computers.

[6]  G. Manimaran,et al.  An Adaptive Scheme for Fault-Tolerant Scheduling of Soft Real-Time Tasks in Multiprocessor Systems , 2001, HiPC.

[7]  Sasikumar Punnekkat,et al.  An Improved Redundancy Scheme for the Optimal Utilization of Onboard Computers , 2009, 2009 Annual IEEE India Conference.

[8]  Yingfeng Oh,et al.  Fault-Tolerant Real Time Multiprocessor Scheduling , 1992 .

[9]  Rami G. Melhem,et al.  Fault-Tolerant Rate-Monotonic Scheduling , 1998, Real-Time Systems.

[10]  Chung Laung Liu,et al.  Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment , 1989, JACM.

[11]  Andrea Bondavalli,et al.  Design of Flexible and Dependable Real-Time Applications , 1996 .

[12]  Lorenzo Strigini,et al.  Adaptable Fault Tolerance for Real-Time Systems , 1994, Responsive Computer Systems.

[13]  Gérard Le Lann,et al.  An analysis of the Ariane 5 flight 501 failure-a system engineering perspective , 1997, ECBS.

[14]  Alan Burns,et al.  Feasibility analysis of fault-tolerant real-time task sets , 1996, Proceedings of the Eighth Euromicro Workshop on Real-Time Systems.

[15]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[16]  Krithi Ramamritham,et al.  Adaptive fault tolerance and graceful degradation under dynamic hard real-time scheduling , 1997, Proceedings Real-Time Systems Symposium.

[17]  D. Basu,et al.  A fault-tolerant computer system for India’s satellite launch vehicle programmes , 1987 .

[18]  Rami G. Melhem,et al.  Fault-Tolerance Through Scheduling of Aperiodic Tasks in Hard Real-Time Multiprocessor Systems , 1997, IEEE Trans. Parallel Distributed Syst..