Reliability-aware power management for parallel real-time applications with precedence constraints

The negative effects of the Dynamic Voltage and Frequency Scaling (DVFS) technique on the system reliability has recently promoted the research on reliability-aware power management (RAPM). RAPM aims at reducing the system energy consumption while preserving the system's reliability. In this paper, we study the RAPM problem for parallel realtime applications for shared memory multiprocessor systems in the presence of precedence constraints. We show that this problem is NP-hard. Depending on how recoveries are scheduled and utilized by a subset of selected tasks, we investigate both individual-recovery and shared-recovery based RAPM heuristics. Online RAPM schemes that exploit dynamic slack generated at runtime are also considered. The proposed schemes are evaluated through extensive simulations. The results show that all schemes can preserve system reliability under all settings. For modest system loads, similar energy savings are obtained by all static schemes. However, when the system load is low, the shared-recovery based schemes need coordinated recovery operations on all processors and thus save less energy. Moreover, by reclaiming dynamic slack, the online schemes yield better energy savings.

[1]  Rabi N. Mahapatra,et al.  Feedback-controlled reliability-aware power management for real-time embedded systems , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[2]  Rami G. Melhem,et al.  Scheduling with dynamic voltage/speed adjustment using slack reclamation in multi-processor real-time systems , 2001, Proceedings 22nd IEEE Real-Time Systems Symposium (RTSS 2001) (Cat. No.01PR1420).

[3]  Sanjay Ranka,et al.  Dynamic slack allocation algorithms for energy minimization on parallel machines , 2010, J. Parallel Distributed Comput..

[4]  Dakai Zhu,et al.  Global scheduling based reliability-aware power management for multiprocessor real-time systems , 2011, Real-Time Systems.

[5]  Rami Melhem,et al.  The effects of energy management on reliability in real-time embedded systems , 2004, ICCAD 2004.

[6]  Petru Eles,et al.  Scheduling and voltage scaling for energy/reliability trade-offs in fault-tolerant time-triggered embedded systems , 2007, 2007 5th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[7]  Vangelis Metsis,et al.  Energy-Constrained Scheduling of DAGs on Multi-core Processors , 2009, IC3.

[8]  Israel Koren,et al.  Towards energy-aware software-based fault tolerance in real-time systems , 2002, ISLPED '02.

[9]  Thomas D. Burd,et al.  Energy efficient CMOS microprocessor design , 1995, Proceedings of the Twenty-Eighth Annual Hawaii International Conference on System Sciences.

[10]  Petru Eles,et al.  A standby-sparing technique with low energy-overhead for fault-tolerant hard real-time systems , 2009, CODES+ISSS '09.

[11]  Rami G. Melhem,et al.  The interplay of power management and fault recovery in real-time systems , 2004, IEEE Transactions on Computers.

[12]  Yann-Hang Lee,et al.  Voltage-clock-scaling adaptive scheduling techniques for low power in hard real-time systems , 2000, Proceedings Sixth IEEE Real-Time Technology and Applications Symposium. RTAS 2000.

[13]  Rami G. Melhem,et al.  Fault-Tolerance Through Scheduling of Aperiodic Tasks in Hard Real-Time Multiprocessor Systems , 1997, IEEE Trans. Parallel Distributed Syst..

[14]  Rami G. Melhem,et al.  Energy-efficient duplex and TMR real-time systems , 2002, 23rd IEEE Real-Time Systems Symposium, 2002. RTSS 2002..

[15]  Dakai Zhu,et al.  Reliability-Aware Energy Management for Periodic Real-Time Tasks , 2009, IEEE Trans. Computers.

[16]  C. M. Krishna,et al.  Towards energy-aware software-based fault tolerance in real-time systems , 2002, Proceedings of the International Symposium on Low Power Electronics and Design.

[17]  P. Hazucha,et al.  Impact of CMOS technology scaling on the atmospheric neutron soft error rate , 2000 .

[18]  Dakai Zhu Reliability-Aware Dynamic Energy Management in Dependable Embedded Real-Time Systems , 2006, IEEE Real Time Technology and Applications Symposium.

[19]  Rami G. Melhem,et al.  Scheduling with Dynamic Voltage/Speed Adjustment Using Slack Reclamation in Multiprocessor Real-Time Systems , 2003, IEEE Trans. Parallel Distributed Syst..

[20]  Xiaorui Wang,et al.  SHIP: Scalable Hierarchical Power Control for Large-Scale Data Centers , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.

[21]  Ravishankar K. Iyer,et al.  Measurement and modeling of computer reliability as affected by system activity , 1986, TOCS.

[22]  Dakai Zhu,et al.  Enhanced reliability-aware power management through shared recovery technique , 2009, 2009 IEEE/ACM International Conference on Computer-Aided Design - Digest of Technical Papers.

[23]  Rajesh Gupta,et al.  Systemwide Energy Minimization in Real-Time Embedded Systems , 2004 .

[24]  Hagbae Kim,et al.  A Time Redundancy Approach to TMR Failures Using Fault-State Likelihoods , 1994, IEEE Trans. Computers.

[25]  Dakai Zhu,et al.  System-Level Energy Management for Periodic Real-Time Tasks , 2006, 2006 27th IEEE International Real-Time Systems Symposium (RTSS'06).

[26]  Ying Zhang,et al.  Energy-aware adaptive checkpointing in embedded real-time systems , 2003, 2003 Design, Automation and Test in Europe Conference and Exhibition.

[27]  Aloysius K. Mok,et al.  Multiprocessor On-Line Scheduling of Hard-Real-Time Tasks , 1989, IEEE Trans. Software Eng..