Static probabilistic timing analysis with a permanent fault detection mechanism

In recent years, random caches have been proposed as a way to simplify the timing analysis of real-time systems. However, technology-scaling makes caches prone to faults. Fault detection mechanisms can detect permanent faults but they affect the timing analysis of a random cache. This paper introduces a Static Probabilistic Timing Analysis (SPTA) technique that accounts for a permanent fault detection mechanism. The permanent fault detection mechanism periodically checks caches for faults and disables faulty cache blocks to prevent future accesses. The SPTA method operates by periodically switching its runtime between the fault-detection and the no-fault-detection states. This is the first SPTA with a realistic permanent fault detection mechanism. Experiments show that the proposed method always provides safe timing estimations—even when few memory blocks are provided—and accurate results—when sufficient memory blocks are present.

[1]  Ivo Bolsens,et al.  Proceedings of the conference on Design, Automation & Test in Europe , 2000 .

[2]  Guillem Bernat,et al.  WCET analysis of probabilistic hard real-time systems , 2002, 23rd IEEE Real-Time Systems Symposium, 2002. RTSS 2002..

[3]  Jaume Abella,et al.  On-Line Failure Detection and Confinement in Caches , 2008, 2008 14th IEEE International On-Line Testing Symposium.

[4]  Alaa R. Alameldeen,et al.  Trading off Cache Capacity for Reliability to Enable Low Voltage Operation , 2008, 2008 International Symposium on Computer Architecture.

[5]  Francisco J. Cazorla,et al.  Using Randomized Caches in Probabilistic Real-Time Systems , 2009, 2009 21st Euromicro Conference on Real-Time Systems.

[6]  Gerard J. M. Smit,et al.  A mathematical approach towards hardware design , 2010, Dynamically Reconfigurable Architectures.

[7]  Andreas Ermedahl,et al.  The Mälardalen WCET Benchmarks: Past, Present And Future , 2010, WCET.

[8]  Shuchang Zhou,et al.  An Efficient Simulation Algorithm for Cache of Random Replacement Policy , 2010, NPC.

[9]  Yu Cao,et al.  A resilience roadmap , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[10]  Bradford M. Beckmann,et al.  The gem5 simulator , 2011, CARN.

[11]  Liliana Cucu-Grosjean,et al.  Measurement-Based Probabilistic Timing Analysis for Multi-path Programs , 2012, 2012 24th Euromicro Conference on Real-Time Systems.

[12]  Yiannakis Sazeides,et al.  The Performance Vulnerability of Architectural and Non-architectural Arrays to Permanent Faults , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[13]  Damien Hardy,et al.  Static probabilistic worst case execution time estimation for architectures with faulty instruction caches , 2013, RTNS '13.

[14]  Liliana Cucu-Grosjean,et al.  Analysis of Probabilistic Cache Related Pre-emption Delays , 2013, 2013 25th Euromicro Conference on Real-Time Systems.

[15]  Francisco J. Cazorla,et al.  DTM: Degraded Test Mode for Fault-Aware Probabilistic Timing Analysis , 2013, 2013 25th Euromicro Conference on Real-Time Systems.

[16]  Francisco J. Cazorla,et al.  A cache design for probabilistically analysable real-time systems , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[17]  Liliana Cucu-Grosjean,et al.  PROARTIS: Probabilistically Analyzable Real-Time Systems , 2013, TECS.

[18]  R. Davis Improvements to Static Probabilistic Timing Analysis for Systems with Random Cache Replacement Policies , 2013 .

[19]  Robert I. Davis,et al.  On the correctness, optimality and precision of Static Probabilistic Timing Analysis , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[20]  Samar Abdi,et al.  Balancing system availability and lifetime with dynamic hidden Markov models , 2014, 2014 NASA/ESA Conference on Adaptive Hardware and Systems (AHS).

[21]  Liliana Cucu-Grosjean,et al.  Progress on Static Probabilistic Timing Analysis for Systems with Random Cache Replacement Policies , 2014 .

[22]  Francisco J. Cazorla,et al.  Timing Verification of Fault-Tolerant Chips for Safety-Critical Applications in Harsh Environments , 2014, IEEE Micro.

[23]  Liliana Cucu-Grosjean,et al.  Static probabilistic timing analysis for real-time systems using random replacement caches , 2014, Real-Time Systems.

[24]  Alan Burns,et al.  Static Probabilistic Timing Analysis of Random Replacement Caches using Lossy Compression , 2014, RTNS '14.

[25]  Robert I. Davis,et al.  Static Probabilistic Timing Analysis for Multi-path Programs , 2015, 2015 IEEE Real-Time Systems Symposium.

[26]  Chao Chen,et al.  Effects of online fault detection mechanisms on Probabilistic Timing Analysis , 2016, 2016 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT).

[27]  Jérôme Hugues,et al.  Static probabilistic timing analysis in presence of faults , 2016, 2016 11th IEEE Symposium on Industrial Embedded Systems (SIES).

[28]  Yiannakis Sazeides,et al.  Probabilistic WCET estimation in presence of hardware for mitigating the impact of permanent faults , 2016, 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE).