Quasi-static fault-tolerant scheduling schemes for energy-efficient hard real-time systems

Highlights? Quasi-static fault-tolerance task scheduling algorithms consisting of offline components and online components are proposed. ? The algorithms are based on a fault model that considers the effect of DVS on transient fault rate. ? The design of offline components enables the online components to save energy using slack due to uncertainties in fault occurrences. ? The algorithms are validated both under simulation environments and on a reallife hard real-time testbed. This paper investigates fault tolerance and dynamic voltage scaling (DVS) in hard real-time systems. The authors present quasi-static task scheduling algorithms that consist of offline components and online components. The offline components are designed the way they enable the online components to achieve energy savings by using the dynamic slack due to variations in task execution times and uncertainties in fault occurrences. The proposed schemes utilize a fault model that considers the effects of voltage scaling on transient fault rate. Simulation results based on real-life task sets and processor data sheets show that the proposed scheduling schemes achieve energy savings of up to 50% over the state-of-art low-energy offline scheduling techniques and incur negligible runtime overheads. A hard real-time real-life test bed has been developed allowing the validation of the proposed algorithms.

[1]  Kang G. Shin,et al.  Optimal Checkpointing of Real-Time Tasks , 1987, IEEE Transactions on Computers.

[2]  Rajesh K. Gupta,et al.  Leakage aware dynamic voltage scaling for real-time embedded systems , 2004, Proceedings. 41st Design Automation Conference, 2004..

[3]  Yann-Hang Lee,et al.  Voltage-Clock-Scaling Adaptive Scheduling Techniques for Low Power in Hard Real-Time Systems , 2003, IEEE Trans. Computers.

[4]  J. Palau,et al.  DASIE Analytical Version: A Predictive Tool for Neutrons, Protons and Heavy Ions Induced SEU Cross Section , 2005, European Conference on Radiation and Its Effects on Components and Systems.

[5]  Tongquan Wei,et al.  Online Task-Scheduling for Fault-Tolerant Low-Energy Real-Time Systems , 2006, 2006 IEEE/ACM International Conference on Computer Aided Design.

[6]  Gang Quan,et al.  Transition-overhead-aware voltage scheduling for fixed-priority real-time systems , 2007, TODE.

[7]  Chung Laung Liu,et al.  Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment , 1989, JACM.

[8]  Alois Knoll,et al.  Analysis and optimization of fault-tolerant task scheduling on multiprocessor embedded systems , 2011, 2011 Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[9]  Trevor Mudge,et al.  Combined dynamic voltage scaling and adaptive body biasing for lower power microprocessors under dynamic workloads , 2002, ICCAD 2002.

[10]  Kang G. Shin,et al.  Error Detection Process - Model, Design, and Its Impact on Computer Performance , 1984, IEEE Trans. Computers.

[11]  Dakai Zhu,et al.  Generalized reliability-oriented energy management for real-time embedded applications , 2011, 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC).

[12]  Jian-Jia Chen,et al.  Optimistic Reliability Aware Energy Management for Real-Time Tasks with Probabilistic Execution Times , 2008, 2008 Real-Time Systems Symposium.

[13]  David Blaauw,et al.  Razor: circuit-level correction of timing errors for low-power operation , 2004, IEEE Micro.

[14]  Ragunathan Rajkumar,et al.  Practical voltage-scaling for fixed-priority RT-systems , 2003, The 9th IEEE Real-Time and Embedded Technology and Applications Symposium, 2003. Proceedings..

[15]  Tongquan Wei,et al.  Reliability-Driven Energy-Efficient Task Scheduling for Multiprocessor Real-Time Systems , 2011, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[16]  Ying Zhang,et al.  Energy-aware adaptive checkpointing in embedded real-time systems , 2003, 2003 Design, Automation and Test in Europe Conference and Exhibition.

[17]  Heonshik Shin,et al.  Visual assessment of a real-time system design: a case study on a CNC controller , 1996, 17th IEEE Real-Time Systems Symposium.

[18]  John P. Lehoczky,et al.  The rate monotonic scheduling algorithm: exact characterization and average case behavior , 1989, [1989] Proceedings. Real-Time Systems Symposium.

[19]  Ying Zhang,et al.  A unified approach for fault tolerance and dynamic power management in fixed-priority real-time embedded systems , 2006, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[20]  J. C. Pickel,et al.  Single-event effects ground testing and on-orbit rate prediction methods: the past, present, and future , 2003 .

[21]  C. Douglas Locke,et al.  Building a predictable avionics platform in Ada: a case study , 1991, [1991] Proceedings Twelfth Real-Time Systems Symposium.

[22]  Dakai Zhu,et al.  Enhanced reliability-aware power management through shared recovery technique , 2009, 2009 IEEE/ACM International Conference on Computer-Aided Design - Digest of Technical Papers.

[23]  Rami G. Melhem,et al.  The interplay of power management and fault recovery in real-time systems , 2004, IEEE Transactions on Computers.

[24]  Luca Benini,et al.  A survey of design techniques for system-level dynamic power management , 2000, IEEE Trans. Very Large Scale Integr. Syst..

[25]  Kang G. Shin,et al.  Real-time dynamic voltage scaling for low-power embedded operating systems , 2001, SOSP.

[26]  Tongquan Wei,et al.  Design of a hard real-time multi-core testbed for energy measurement , 2011, Microelectron. J..

[27]  Jörg Henkel,et al.  SEAL: Soft error aware low power scheduling by Monte Carlo state space under the influence of stochastic spatial and temporal dependencies , 2011, 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC).

[28]  Rami Melhem,et al.  The effects of energy management on reliability in real-time embedded systems , 2004, ICCAD 2004.

[29]  Petru Eles,et al.  Scheduling and voltage scaling for energy/reliability trade-offs in fault-tolerant time-triggered embedded systems , 2007, 2007 5th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[30]  Margaret Martonosi,et al.  XTREM: a power simulator for the Intel XScale® core , 2004, LCTES '04.

[31]  Tongquan Wei,et al.  Fixed-Priority Allocation and Scheduling for Energy-Efficient Fault Tolerance in Hard Real-Time Multiprocessor Systems , 2008, IEEE Transactions on Parallel and Distributed Systems.

[32]  Shubhendu S. Mukherjee,et al.  Transient fault detection via simultaneous multithreading , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[33]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[34]  Rolf Ernst,et al.  Reliability analysis for MPSoCs with mixed-critical, hard real-time constraints , 2011, 2011 Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[35]  Reinhold Weicker,et al.  Dhrystone: a synthetic systems programming benchmark , 1984, CACM.

[36]  Lothar Thiele,et al.  Adaptive Dynamic Power Management for Hard Real-Time Systems , 2009, 2009 30th IEEE Real-Time Systems Symposium.

[37]  Petru Eles,et al.  Scheduling of Fault-Tolerant Embedded Systems with Soft and Hard Timing Constraints , 2008, 2008 Design, Automation and Test in Europe.

[38]  Flavius Gruian Hard real-time scheduling for low-energy using stochastic data and DVS processors , 2001, ISLPED '01.

[39]  Byung Kook Kim,et al.  An optimal checkpointing-strategy for real-time control systems under transient faults , 2001, IEEE Trans. Reliab..

[40]  Lothar Thiele,et al.  Combining optimistic and pessimistic DVS scheduling: An adaptive scheme and analysis , 2010, 2010 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[41]  Krishnendu Chakrabarty,et al.  Energy-Aware Fault Tolerance in Fixed-Priority Real-Time Embedded Systems , 2003, ICCAD 2003.

[42]  Krishnendu Chakrabarty,et al.  Soft error-aware design optimization of low power and time-constrained embedded systems , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[43]  E. Normand Single event upset at ground level , 1996 .

[44]  R. Koga,et al.  Single-event effects test results of 512MB SDRAMs , 2003, 2003 IEEE Radiation Effects Data Workshop.

[45]  Lothar Thiele,et al.  Applying real-time interface and calculus for dynamic power management in hard real-time systems , 2011, Real-Time Systems.

[46]  Kiyoung Choi,et al.  Power conscious fixed priority scheduling for hard real-time systems , 1999, DAC '99.