Mixed criticality scheduling in fault-tolerant distributed real-time systems

Modern safety critical real-time systems are composed of tasks of mixed criticalities and the problem of scheduling them in a fault tolerant manner, on a distributed platform, is challenging. Fault tolerance is typically achieved by using redundancy techniques, most commonly in the form of temporal redundancy which involves executing an alternate task before the original deadline of the failed task. Additionally, studies like Zonal Hazard Analysis (ZHA) and Fault Hazard Analysis (FHA) may impose extra constraints on the re-executions, e.g., spatial separation of alternates, to improve reliability. In this paper, we present a method for scheduling mixed criticality real-time tasks on a distributed platform in a fault tolerant manner while taking into account the recommendations given by the reliability studies like ZHA and FHA. First, we use mathematical optimization to allocate tasks on the processors, and then derive fault tolerant and fault aware feasibility windows for the critical and non-critical tasks respectively. Finally, we derive scheduler specific task attributes like priorities for the fixed priority scheduler. Our method provides hard real-time fault tolerance guarantees for critical tasks while maximizing resource utilization for non-critical tasks.

[1]  Sasikumar Punnekkat,et al.  Maximizing the Fault Tolerance Capability of Fixed Priority Schedules , 2008, 2008 14th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications.

[2]  Alan Burns,et al.  Feasibility analysis of fault-tolerant real-time task sets , 1996, Proceedings of the Eighth Euromicro Workshop on Real-Time Systems.

[3]  Hermann Kopetz,et al.  On the Fault Hypothesis for a Safety-Critical Real-Time System , 2004, ASWSD.

[4]  R. E. Caldwell,et al.  Zonal analysis: the final step in system safety assessment (of aircraft) , 1991, Annual Reliability and Maintainability Symposium. 1991 Proceedings.

[5]  Alan Burns,et al.  Timing Faults and Mixed Criticality Systems , 2011, Dependable and Historic Computing.

[6]  Neeraj Suri,et al.  Dependability driven integration of mixed criticality SW components , 2006, Ninth IEEE International Symposium on Object and Component-Oriented Real-Time Distributed Computing (ISORC'06).

[7]  Sasikumar Punnekkat,et al.  Optimizing the fault tolerance capabilities of distributed real-time systems , 2009, 2009 IEEE Conference on Emerging Technologies & Factory Automation.

[8]  Sasikumar Punnekkat,et al.  Towards a Contract-based Fault-tolerant Scheduling Framework for Distributed Real-time Systems , 2011 .

[9]  Kishor S. Trivedi,et al.  Task allocation in fault-tolerant distributed systems , 1983, Acta Informatica.

[10]  J. A. McDermid,et al.  Towards integrated safety analysis and design , 1994, SIAP.

[11]  Wang Yi,et al.  Effective and Efficient Scheduling of Certifiable Mixed-Criticality Sporadic Task Systems , 2011, 2011 IEEE 32nd Real-Time Systems Symposium.

[12]  Rami G. Melhem,et al.  Fault-Tolerance Through Scheduling of Aperiodic Tasks in Hard Real-Time Multiprocessor Systems , 1997, IEEE Trans. Parallel Distributed Syst..

[13]  Sanjoy K. Baruah,et al.  Towards the Design of Certifiable Mixed-criticality Systems , 2010, 2010 16th IEEE Real-Time and Embedded Technology and Applications Symposium.