A Comparative Study of System-Level Energy Management Methods for Fault-Tolerant Hard Real-Time Systems

Low energy consumption and fault tolerance are often key objectives in the design of real-time embedded systems. However, these objectives are at odds, and there is a trade-off between them. Real-time systems usually use system level energy reduction methods, i.e., dynamic voltage scaling (DVS) and dynamic power management (DPM). Also hard real-time systems often use replication to achieve fault tolerance. In this paper, we investigate the impact of system level energy reduction methods on both the reliability and energy consumption of hard real-time systems which use replication for fault tolerance. In this analysis, we have considered four various existing energy management methods: 1) Classic DPM, 2) Classic DVS, 3) Postponement method: a variation of DPM which is only applicable to replicated systems, and 4) Hybrid method: a combination of Postponement and DVS. Based on the comparative study, we have provided guidelines so that a designer can decide which energy management method is more suitable for a given application. For example, we have shown that when reliability is the main concern, the postponement method is the most preferable. However, when the energy consumption is the primary concern, the hybrid method may be more appropriate.

[1]  Rami G. Melhem,et al.  Energy-efficient duplex and TMR real-time systems , 2002, 23rd IEEE Real-Time Systems Symposium, 2002. RTSS 2002..

[2]  Barry W. Johnson Design & analysis of fault tolerant digital systems , 1988 .

[3]  Rami G. Melhem,et al.  The interplay of power management and fault recovery in real-time systems , 2004, IEEE Transactions on Computers.

[4]  P. Hazucha,et al.  Cosmic-ray soft error rate characterization of a standard 0.6-/spl mu/m CMOS process , 2000, IEEE Journal of Solid-State Circuits.

[5]  Hermann Kopetz,et al.  Real-time systems , 2018, CSC '73.

[6]  Seyed Ghassem Miremadi,et al.  Error Detection Enhancement in COTS Superscalar Processors with Performance Monitoring Features , 2004, J. Electron. Test..

[7]  Daniel P. Siewiorek,et al.  Derivation and Calibration of a Transient Error Reliability Model , 1982, IEEE Transactions on Computers.

[8]  C. M. Krishna,et al.  Towards energy-aware software-based fault tolerance in real-time systems , 2002, Proceedings of the International Symposium on Low Power Electronics and Design.

[9]  Rami G. Melhem,et al.  The effects of energy management on reliability in real-time embedded systems , 2004, IEEE/ACM International Conference on Computer Aided Design, 2004. ICCAD-2004..

[10]  Petru Eles,et al.  Synthesis of Fault-Tolerant Schedules with Transparency/Performance Trade-offs for Distributed Embedded Systems , 2006, Proceedings of the Design Automation & Test in Europe Conference.

[11]  Bashir M. Al-Hashimi,et al.  Combined time and information redundancy for SEU-tolerance in energy-efficient real-time systems , 2006, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[12]  Arun K. Somani,et al.  Soft error sensitivity characterization for microprocessor dependability enhancement strategy , 2002, Proceedings International Conference on Dependable Systems and Networks.

[13]  David R. Kaeli,et al.  Using hardware vulnerability factors to enhance AVF analysis , 2010, ISCA.

[14]  R. Hokinson,et al.  Historical trend in alpha-particle induced soft error rates of the Alpha/sup TM/ microprocessor , 2001, 2001 IEEE International Reliability Physics Symposium Proceedings. 39th Annual (Cat. No.00CH37167).

[15]  Hiroto Yasuura,et al.  Voltage scheduling problem for dynamically variable voltage processors , 1998, Proceedings. 1998 International Symposium on Low Power Electronics and Design (IEEE Cat. No.98TH8379).

[16]  Nagarajan Kandasamy,et al.  Transparent recovery from intermittent faults in time-triggered distributed systems , 2003 .

[17]  Anand Sivasubramaniam,et al.  Mechanisms for bounding vulnerabilities of processor structures , 2007, ISCA '07.

[18]  A. Sinha,et al.  JouleTrack-a Web based tool for software energy profiling , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[19]  Xiaodong Li,et al.  Online Estimation of Architectural Vulnerability Factor for Soft Errors , 2008, 2008 International Symposium on Computer Architecture.

[20]  Petru Eles,et al.  Scheduling and voltage scaling for energy/reliability trade-offs in fault-tolerant time-triggered embedded systems , 2007, 2007 5th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[21]  Petru Eles,et al.  Design optimization of time- and cost-constrained fault-tolerant distributed embedded systems , 2005, Design, Automation and Test in Europe.

[22]  Petru Eles,et al.  Scheduling of Fault-Tolerant Embedded Systems with Soft and Hard Timing Constraints , 2008, 2008 Design, Automation and Test in Europe.

[23]  Bin Li,et al.  Versatile prediction and fast estimation of Architectural Vulnerability Factor from processor performance metrics , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[24]  Sanjay J. Patel,et al.  Examining ACE analysis reliability estimates using fault-injection , 2007, ISCA '07.

[25]  Rami G. Melhem,et al.  Analysis of an energy efficient optimistic TMR scheme , 2004, Proceedings. Tenth International Conference on Parallel and Distributed Systems, 2004. ICPADS 2004..

[26]  Joel Emer,et al.  A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[27]  Ying Zhang,et al.  Dynamic adaptation for fault tolerance and power management in embedded real-time systems , 2004, TECS.

[28]  Stefan Poledna,et al.  Fault-tolerant real-time systems - the problem of replica determinism , 1996, The Kluwer international series in engineering and computer science.