Mapping of Fault-Tolerant Applications with Transparency on Distributed Embedded Systems*

In this paper we present an approach for the mapping optimization of fault-tolerant embedded systems for safety-critical applications. Processes and messages are statically scheduled. Process re-execution is used for recovering from multiple transient faults. We call process recovery transparent if it does not affect operation of other processes. Transparent recovery has the advantage of fault containment, improved debugability and less memory needed to store the fault-tolerant schedules. However, it will introduce additional delays that can lead to violations of the timing constraints of the application. We propose an algorithm for the mapping of fault-tolerant applications with transparency. The algorithm decides a mapping of processes on computation nodes such that the application is schedulable and the transparency properties imposed by the designer are satisfied. The mapping algorithm is driven by a heuristic that is able to estimate the worst-case schedule length and indicate whether a certain mapping alternative is schedulable

[1]  Alan Burns,et al.  Feasibility analysis of fault-tolerant real-time task sets , 1996, Proceedings of the Eighth Euromicro Workshop on Real-Time Systems.

[2]  Gerhard Fohler,et al.  Joint scheduling of distributed complex periodic and hard aperiodic tasks in statically scheduled systems , 1995, Proceedings 16th IEEE Real-Time Systems Symposium.

[3]  Petru Eles,et al.  Synthesis of Fault-Tolerant Schedules with Transparency/Performance Trade-offs for Distributed Embedded Systems , 2006, Proceedings of the Design Automation & Test in Europe Conference.

[4]  Hermann Kopetz,et al.  Distributed fault-tolerant real-time systems: the Mars approach , 1989, IEEE Micro.

[5]  Gerhard Fohler Adaptive fault-tolerance with statically scheduled real-time systems , 1997, Proceedings Ninth Euromicro Workshop on Real Time Systems.

[6]  Kang G. Shin,et al.  A Fault-Tolerant Scheduling Algorithm for Real-Time Periodic Tasks with Possible Software Faults , 2003, IEEE Trans. Computers.

[7]  Nagarajan Kandasamy,et al.  Dependable Communication Synthesis for Distributed Embedded Systems , 2003, SAFECOMP.

[8]  Ying Zhang,et al.  Energy-aware adaptive checkpointing in embedded real-time systems , 2003, 2003 Design, Automation and Test in Europe Conference and Exhibition.

[9]  Paul Pop,et al.  Analysis and Synthesis of Communication-Intensive Heterogeneous Real-Time Systems , 2003 .

[10]  Luigi V. Mancini,et al.  Scheduling algorithms for fault-tolerance in hard-real-time systems , 1994, Real-Time Systems.

[11]  Nagarajan Kandasamy,et al.  Dependable communication synthesis for distributed embedded systems , 2003, Reliab. Eng. Syst. Saf..

[12]  Hermann Kopetz,et al.  Real-time systems , 2018, CSC '73.

[13]  Hermann Kopetz,et al.  The time-triggered architecture , 1998, Proceedings First International Symposium on Object-Oriented Real-Time Distributed Computing (ISORC '98).

[14]  Petru Eles,et al.  Design optimization of time- and cost-constrained fault-tolerant distributed embedded systems , 2005, Design, Automation and Test in Europe.

[15]  Stefan Poledna,et al.  The XBW model for dependable real-time systems , 1998, Proceedings 1998 International Conference on Parallel and Distributed Systems (Cat. No.98TB100250).

[16]  Nagarajan Kandasamy,et al.  Transparent recovery from intermittent faults in time-triggered distributed systems , 2003 .

[17]  Petru Eles,et al.  Scheduling with bus access optimization for distributed embedded systems , 2000, IEEE Trans. Very Large Scale Integr. Syst..

[18]  F. Glover,et al.  In Modern Heuristic Techniques for Combinatorial Problems , 1993 .

[19]  Jeffrey D. Ullman,et al.  NP-Complete Scheduling Problems , 1975, J. Comput. Syst. Sci..

[20]  Alberto L. Sangiovanni-Vincentelli,et al.  Fault-tolerant deployment of embedded software for cost-sensitive real-time feedback-control applications , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[21]  Andrea Bondavalli,et al.  Threshold-Based Mechanisms to Discriminate Transient from Intermittent Faults , 2000, IEEE Trans. Computers.

[22]  C. Reeves Modern heuristic techniques for combinatorial problems , 1993 .

[23]  Janusz Sosnowski,et al.  Transient fault tolerance in digital systems , 1994, IEEE Micro.

[24]  Yves Sorel,et al.  Off-line real-time fault-tolerant scheduling , 2001, Proceedings Ninth Euromicro Workshop on Parallel and Distributed Processing.