A Time-Triggered Scheduling Algorithm for Active Diagnosis in Heterogeneous Distributed Systems

Many safety-critical control applications have now started using distributed embedded systems that consist of sensors, actuators, processors and a fully connected network with no mechanical backup. One such example is a fly-by-wire control system used in modern day aircraft. These systems require a high level of reliability, safety and performance. Many redundant components are added to ensure the reliability of such systems but this is not a viable solution because it increases the overall size of the system along with its cost and power consumption. A better approach is to continuously monitor the system to ensure that the dependability of the overall system is greater than that of its hardware and software components. Active diagnosis is one such technique that monitors the operation of the system components at runtime for fault isolation and error recovery. Since such systems are highly safety-critical i.e. their failure can result in loss of life etc., therefore it is essential that the analysis of the diagnostic information and subsequent recovery from the fault is performed within predictable time. Since scheduling the diagnostic tasks on to the system is an important aspect of this timely analysis, so the present work proposes a time-triggered static scheduler for a heterogeneous distributed system that uses diagnostic queries and a real-time database to find faults. The proposed algorithm calculates the points in time at which each diagnostic query is executed or data is replicated to or deleted from the real-time database. This a priori knowledge bounds the time for fault identification and results in a realizable diagnostic framework. The algorithm utilizes a specific priority scheme to incorporate the heterogeneity of the system and schedules the diagnostic tasks while respecting their precedence and periodicity constraints. It is prototypically implemented and is experimentally evaluated. The paper demonstrates that time-triggered scheduling can be successfully incorporated in a multi-query based diagnostic environment to increase the reliability, performance as well as to ensure the safety of a heterogeneous distributed system.

[1]  Nagarajan Kandasamy,et al.  Time-constrained failure diagnosis in distributed embedded systems: application to actuator diagnosis , 2005, IEEE Transactions on Parallel and Distributed Systems.

[2]  Salim Hariri,et al.  Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing , 2002, IEEE Trans. Parallel Distributed Syst..

[3]  GERNOT METZE,et al.  On the Connection Assignment Problem of Diagnosable Systems , 1967, IEEE Trans. Electron. Comput..

[4]  R. Pigeau,et al.  Clarifying the Concepts of Control and of Command , 1999 .

[5]  Oliver Sinnen,et al.  Task Scheduling for Parallel Systems , 2007, Wiley series on parallel and distributed computing.

[6]  G. Tortora Fault-tolerant control and intelligent instrumentation , 2001 .

[7]  David Clarke,et al.  The Self-Validating Actuator , 1997 .

[8]  Thomas Thurner,et al.  Time-triggered architecture for safety-related distributed real-time systems in transportation systems , 1998, Digest of Papers. Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing (Cat. No.98CB36224).

[9]  John C. Knight,et al.  Safety critical systems: challenges and directions , 2002, Proceedings of the 24th International Conference on Software Engineering. ICSE 2002.

[10]  Neeraj Suri,et al.  Formally Verified On-Line Diagnosis , 1997, IEEE Trans. Software Eng..

[11]  Hironori Kasahara,et al.  Practical Multiprocessor Scheduling Algorithms for Efficient Parallel Processing , 1984, IEEE Transactions on Computers.

[12]  Naresh K. Sinha,et al.  Modern Control Systems , 1981, IEEE Transactions on Systems, Man, and Cybernetics.

[13]  Krishan K. Sabnani,et al.  Spare Capacity as a Means of Fault Detection and Diagnosis in Multiprocessor Systems , 1989, IEEE Trans. Computers.

[14]  Hamid Arabnejad,et al.  List Scheduling Algorithm for Heterogeneous Systems by an Optimistic Cost Table , 2014, IEEE Transactions on Parallel and Distributed Systems.

[15]  Giorgio Buttazzo,et al.  Hard Real-Time Computing Systems: Predictable Scheduling Algorithms and Applications , 1997 .

[16]  Pabitra Mohan Khilar,et al.  A Distributed Diagnosis Approach to Fault Tolerant Multi-rate Real-Time Embedded Systems , 2007, 10th International Conference on Information Technology (ICIT 2007).

[17]  Sriram Sankar,et al.  Concurrent runtime monitoring of formally specified programs , 1993, Computer.

[18]  Ishfaq Ahmad,et al.  Dynamic Critical-Path Scheduling: An Effective Technique for Allocating Task Graphs to Multiprocessors , 1996, IEEE Trans. Parallel Distributed Syst..

[19]  Roman Obermaisser,et al.  Time-triggered scheduling of query executions for active diagnosis in distributed real-time systems , 2017, 2017 22nd IEEE International Conference on Emerging Technologies and Factory Automation (ETFA).

[20]  Myeong-Cheol Ko,et al.  CPOC: Effective Static Task Scheduling for Grid Computing , 2005, HPCC.

[21]  Rolf Isermann,et al.  Fault-tolerant drive-by-wire systems , 2002 .

[22]  Nawwaf N. Kharma,et al.  A high performance algorithm for static task scheduling in heterogeneous distributed computing systems , 2008, J. Parallel Distributed Comput..

[23]  Andrzej Pelc Optimal Fault Diagnosis in Comparison Models , 1992, IEEE Trans. Computers.

[24]  Daniel Gajski,et al.  Hypertool: A Programming Aid for Message-Passing Systems , 1990, IEEE Trans. Parallel Distributed Syst..

[25]  Ragunathan Rajkumar,et al.  Runtime monitoring of timing constraints in distributed real-time systems , 2005, Real-Time Systems.

[26]  Guy Juanole,et al.  Observer-A Concept for Formal On-Line Validation of Distributed Systems , 1994, IEEE Trans. Software Eng..

[27]  Yves Sorel,et al.  A rapid heuristic for scheduling non-preemptive dependent periodic tasks onto multiprocessor , 2007, ISCA PDCS.

[28]  Minhaj Ahmad Khan,et al.  Scheduling for heterogeneous Systems using constrained critical paths , 2012, Parallel Comput..

[29]  Abhijit Sengupta,et al.  On self-diagnosable multiprocessor systems: diagnosis by the comparison approach , 1989, [1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[30]  Leonel Sousa,et al.  List scheduling: extension for contention awareness and evaluation of node priorities for heterogeneous cluster architectures , 2004, Parallel Comput..

[31]  Patrick E. Lanigan,et al.  Diagnosis in Automotive Systems : A Survey , 2011 .

[32]  Marc Donner,et al.  Run-time monitoring of real-time systems , 1991, [1991] Proceedings Twelfth Real-Time Systems Symposium.

[33]  Roman Obermaisser,et al.  Active Diagnosis in Distributed Embedded Systems Based on the Time-Triggered Execution of Semantic Web Queries , 2014, 2014 IEEE 17th International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing.

[34]  Karsten Schwan,et al.  Run-time detection in parallel and distributed systems: application to safety-critical systems , 1999, Proceedings. 19th IEEE International Conference on Distributed Computing Systems (Cat. No.99CB37003).

[35]  Arun K. Somani,et al.  Low Overhead Multiprocessor Allocation Strategies Exploiting System Space Capacity for Fault Detection and Location , 1995, IEEE Trans. Computers.

[36]  Ishfaq Ahmad,et al.  Link contention-constrained scheduling and mapping of tasks and messages to a network of heterogeneous processors , 1999, Proceedings of the 1999 International Conference on Parallel Processing.

[37]  Arjan J. C. van Gemund,et al.  On the complexity of list scheduling algorithms for distributed-memory systems , 1999, ICS '99.

[38]  Miroslaw Malek,et al.  The consensus problem in fault-tolerant computing , 1993, CSUR.

[39]  Yanfeng Gong,et al.  Advanced Real-Time Synchrophasor Applications , 2017 .