Leveraging Weakly-hard Constraints for Improving System Fault Tolerance with Functional and Timing Guarantees

Many safety-critical real-time systems operate under harsh environment and are subject to soft errors caused by transient or intermittent faults. It is critical and yet often very challenging to apply fault tolerance techniques in these systems, due to resource limitations and stringent constraints on timing and functionality. In this work, we leverage the concept of weakly-hard constraints, which allows task deadline misses in a bounded manner, to improve system's capability to accommodate fault tolerance techniques while ensuring timing and functional correctness. In particular, we a) quantitatively measure control cost under different deadline hit/miss scenarios and identify weak-hard constraints that guarantee control stability; b) employ typical worst-case analysis (TWCA) to bound the number of deadline misses and approximate system control cost; c) develop an event-based simulation method to check the task execution pattern and evaluate system control cost for any given solution; and d) develop a meta-heuristic algorithm that consists of heuristic methods and a simulated annealing procedure to explore the design space. Our experiments on an industrial case study and synthetic examples demonstrate the effectiveness of our approach.

[1]  Karl Johan Åström,et al.  Computer-Controlled Systems: Theory and Design , 1984 .

[2]  Seyed Ghassem Miremadi,et al.  A hardware approach to concurrent error detection capability enhancement in COTS processors , 2005, 11th Pacific Rim International Symposium on Dependable Computing (PRDC'05).

[3]  Rodrigo Possamai Bastos,et al.  Effectiveness of Hardware-Level Techniques in Detecting Transient Faults , 2020 .

[4]  Rolf Ernst,et al.  Improving a Compositional Timing Analysis Framework for Weakly-Hard Real-Time Systems , 2019, 2019 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS).

[5]  Qi Zhu,et al.  Cross-Layer Design of Automotive Systems , 2020, ArXiv.

[6]  Qi Zhu,et al.  Security-Driven Codesign with Weakly-Hard Constraints for Real-Time Embedded Systems , 2019, 2019 IEEE 37th International Conference on Computer Design (ICCD).

[7]  Wenchao Li,et al.  Formal verification of weakly-hard systems , 2019, HSCC.

[8]  Marco Di Natale,et al.  Weakly Hard Schedulability Analysis for Fixed Priority Scheduling of Periodic Real-Time Tasks , 2017, ACM Trans. Embed. Comput. Syst..

[9]  P. Ramanathan,et al.  Deadlines , 2019, PodoPost.

[10]  Qi Zhu,et al.  Job-Class-Level Fixed Priority Scheduling of Weakly-Hard Real-Time Systems , 2019, 2019 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS).

[11]  Muhammad Shafique,et al.  Hardware and Software Techniques for Heterogeneous Fault-Tolerance , 2018, 2018 IEEE 24th International Symposium on On-Line Testing And Robust System Design (IOLTS).

[12]  Wenchao Li,et al.  Exploring weakly-hard paradigm for networked systems , 2019, DESTION@CPSIoTWeek.

[13]  Goran Frehse,et al.  Formal Analysis of Timing Effects on Closed-Loop Properties of Control Software , 2014, 2014 IEEE Real-Time Systems Symposium.

[14]  Dawn Tilbury,et al.  Control tutorials for MATLAB and Simulink : user's guide , 1999 .

[15]  Edward J. McCluskey,et al.  Control-flow checking by software signatures , 2002, IEEE Trans. Reliab..

[16]  Robert E. Lyons,et al.  The Use of Triple-Modular Redundancy to Improve Computer Reliability , 1962, IBM J. Res. Dev..

[17]  Marco Di Natale,et al.  Beyond the Weakly Hard Model: Measuring the Performance Cost of Deadline Misses , 2018, ECRTS.

[18]  R.C. Baumann,et al.  Radiation-induced soft errors in advanced semiconductor technologies , 2005, IEEE Transactions on Device and Materials Reliability.

[19]  David S. Johnson,et al.  Approximation Algorithms for Bin-Packing — An Updated Survey , 1984 .

[20]  G. Buttazzo,et al.  Addressing Analysis and Partitioning Issues for the WATERS 2019 Challenge , 2019 .

[21]  Rolf Ernst,et al.  Improved Deadline Miss Models for Real-Time Systems Using Typical Worst-Case Analysis , 2015, 2015 27th Euromicro Conference on Real-Time Systems.

[22]  Petru Eles,et al.  Analysis and optimization of fault-tolerant embedded systems with hardened processors , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[23]  John C. Knight,et al.  A Framework for Software Fault Tolerance in Real-Time Systems , 1983, IEEE Transactions on Software Engineering.

[24]  Johan Karlsson,et al.  Two software techniques for on-line error detection , 1992, [1992] Digest of Papers. FTCS-22: The Twenty-Second International Symposium on Fault-Tolerant Computing.

[25]  Parameswaran Ramanathan,et al.  A Dynamic Priority Assignement Technique for Streams with (m, k)-Firm Deadlines , 1995, IEEE Trans. Computers.

[26]  Yue Gao,et al.  Using explicit output comparisons for fault tolerant scheduling (FTS) on modern high-performance processors , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[27]  G. E. Taylor,et al.  Computer Controlled Systems: Theory and Design , 1985 .

[28]  Yeqiong Song,et al.  Providing Real-Time Applications With Graceful Degradation of QoS and Fault Tolerance According to$(m, k)$-Firm Model , 2006, IEEE Transactions on Industrial Informatics.

[29]  Ankita Yadav,et al.  Fault Tolerance in Real Time Distributed System , 2018 .

[30]  N. Nichols,et al.  Robust pole assignment in linear state feedback , 1985 .

[31]  Yue Gao,et al.  Analysis and optimization of soft error tolerance strategies for real-time systems , 2015, 2015 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[32]  Rolf Ernst,et al.  Formal analysis of sporadic overload in real-time systems , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[33]  Samarjit Chakraborty,et al.  Relaxing Signal Delay Constraints in Distributed Embedded Controllers , 2014, IEEE Transactions on Control Systems Technology.

[34]  Chung-Wei Lin,et al.  SAW: A Tool for Safety Analysis of Weakly-Hard Systems , 2020, CAV.

[35]  Petru Eles,et al.  Design Optimization of Time- and Cost-Constrained Fault-Tolerant Embedded Systems With Checkpointing and Replication , 2009, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[36]  Alan Burns,et al.  Weakly Hard Real-Time Systems , 2001, IEEE Trans. Computers.