Effects of online fault detection mechanisms on Probabilistic Timing Analysis

In real time systems, random caches have been proposed as a way to simplify software timing analysis, by avoiding corner cases usually found in deterministic systems. Using this random approach, one can obtain an application's probabilistic Worst Case Execution Time (pWCET) to be used for timing analysis. As with deterministic systems, technology scaling in cache memories is making transient and permanent faults more likely, which in turn affects the system's timing behavior. To mitigate these effects, one can introduce a detection mechanism that classifies a fault as transient or permanent, with the goal of disabling permanently faulty cache blocks to avoid future accesses. In this paper, we compare the effects of two online detection mechanisms for permanent faults, namely rule-based detection and Dynamic Hidden Markov Model (D-HMM) based detection, for the generation of safe pWCET estimates. Experimental results show that different mechanisms can greatly affect safe pWCET margins, and that by using D-HMM the pWCET of the system can be improved compared to rule-based detection.

[1]  Liliana Cucu-Grosjean,et al.  Static probabilistic timing analysis for real-time systems using random replacement caches , 2014, Real-Time Systems.

[2]  Robert I. Davis,et al.  On the correctness, optimality and precision of Static Probabilistic Timing Analysis , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[3]  Alexander P. Trishchenko,et al.  Highly Elliptical Orbits for Arctic observations: Assessment of ionizing radiation , 2014 .

[4]  Guillem Bernat,et al.  WCET analysis of probabilistic hard real-time systems , 2002, 23rd IEEE Real-Time Systems Symposium, 2002. RTSS 2002..

[5]  Francisco J. Cazorla,et al.  Timing Verification of Fault-Tolerant Chips for Safety-Critical Applications in Harsh Environments , 2014, IEEE Micro.

[6]  Francisco J. Cazorla,et al.  Using Randomized Caches in Probabilistic Real-Time Systems , 2009, 2009 21st Euromicro Conference on Real-Time Systems.

[7]  Samar Abdi,et al.  Balancing system availability and lifetime with dynamic hidden Markov models , 2014, 2014 NASA/ESA Conference on Adaptive Hardware and Systems (AHS).

[8]  Gerard J. M. Smit,et al.  A mathematical approach towards hardware design , 2010, Dynamically Reconfigurable Architectures.

[9]  Yiannakis Sazeides,et al.  Probabilistic WCET estimation in presence of hardware for mitigating the impact of permanent faults , 2016, 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[10]  Francisco J. Cazorla,et al.  DTM: Degraded Test Mode for Fault-Aware Probabilistic Timing Analysis , 2013, 2013 25th Euromicro Conference on Real-Time Systems.

[11]  Alan Burns,et al.  Static Probabilistic Timing Analysis of Random Replacement Caches using Lossy Compression , 2014, RTNS '14.

[12]  Robert I. Davis,et al.  Static Probabilistic Timing Analysis for Multi-path Programs , 2015, 2015 IEEE Real-Time Systems Symposium.

[13]  Yu Cao,et al.  A resilience roadmap , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[14]  Francisco J. Cazorla,et al.  A cache design for probabilistically analysable real-time systems , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[15]  Shuchang Zhou,et al.  An Efficient Simulation Algorithm for Cache of Random Replacement Policy , 2010, NPC.

[16]  Liliana Cucu-Grosjean,et al.  Analysis of Probabilistic Cache Related Pre-emption Delays , 2013, 2013 25th Euromicro Conference on Real-Time Systems.

[17]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[18]  Damien Hardy,et al.  Static probabilistic worst case execution time estimation for architectures with faulty instruction caches , 2013, RTNS.

[19]  Cristian Constantinescu,et al.  Trends and Challenges in VLSI Circuit Reliability , 2003, IEEE Micro.

[20]  Liliana Cucu-Grosjean,et al.  PROARTIS: Probabilistically Analyzable Real-Time Systems , 2013, TECS.

[21]  Jérôme Hugues,et al.  Static probabilistic timing analysis in presence of faults , 2016, 2016 11th IEEE Symposium on Industrial Embedded Systems (SIES).

[22]  Jaume Abella,et al.  On-Line Failure Detection and Confinement in Caches , 2008, 2008 14th IEEE International On-Line Testing Symposium.