Control-Quality Optimization for Distributed Embedded Systems with Adaptive Fault Tolerance

In this paper, we propose a design framework for distributed embedded control systems that ensures reliable execution and high quality of control even if some computation nodes fail. When a node fails, the configuration of the underlying distributed system changes and the system must adapt to this new situation by activating tasks at operational nodes. The task mapping as well as schedules and control laws that are customized for the new configuration influence the control quality and must, therefore, be optimized. The number of possible configurations due to faults is exponential in the number of nodes in the system. This design-space complexity leads to unaffordable design time and large memory requirements to store information related to mappings, schedules, and controllers. We demonstrate that it is sufficient to synthesize solutions for a small number of base and minimal configurations to achieve fault tolerance with an inherent minimum level of control quality. We also propose an algorithm to further improve control quality with a priority-based search of the set of configurations and trade-offs between task migration and replication.

[1]  Astrom Computer Controlled Systems , 1990 .

[2]  Nicolas Navet,et al.  Trends in Automotive Communication Systems , 2005, Proceedings of the IEEE.

[3]  Petru Eles,et al.  Control-Quality Driven Task Mapping for Distributed Embedded Control Systems , 2011, 2011 IEEE 17th International Conference on Embedded and Real-Time Computing Systems and Applications.

[4]  Paulo Tabuada,et al.  Robust discrete synthesis against unspecified disturbances , 2011, HSCC '11.

[5]  Hokeun Kim,et al.  A task remapping technique for reliable multi-core embedded systems , 2010, 2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[6]  Petru Eles,et al.  Integrated scheduling and synthesis of control applications on distributed embedded systems , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[7]  Paulo Tabuada,et al.  Event-Triggered Real-Time Scheduling of Stabilizing Control Tasks , 2007, IEEE Transactions on Automatic Control.

[8]  P. Narasimhan,et al.  Architectural support for mode-driven fault tolerance in distributed applications , 2005, WADS@ICSE.

[9]  Krzysztof R. Apt,et al.  Constraint logic programming using Eclipse , 2007 .

[10]  A. Singh,et al.  Fault-tolerant systems , 1990, Computer.

[11]  Shreyas Sundaram,et al.  Reputation-based networked control with data-corrupting channels , 2011, HSCC '11.

[12]  Karl Johan Åström,et al.  Computer-controlled systems (3rd ed.) , 1997 .

[13]  Krzysztof R. Apt,et al.  Constraint Logic Programming Using ECL i PS e : Constraint propagation in ECL i ps e , 2006 .

[14]  K.-E. Arzen,et al.  How does control timing affect performance? Analysis and simulation of timing using Jitterbug and TrueTime , 2003, IEEE Control Systems.

[15]  Shekhar Y. Borkar,et al.  Designing reliable systems from unreliable components: the challenges of transistor variability and degradation , 2005, IEEE Micro.

[16]  Hermann Kopetz,et al.  Real-time systems , 2018, CSC '73.

[17]  Alberto L. Sangiovanni-Vincentelli,et al.  Fault-Tolerant Distributed Deployment of Embedded Control Software , 2008, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[18]  Bruno Sinopoli,et al.  Foundations of Control and Estimation Over Lossy Networks , 2007, Proceedings of the IEEE.