Energy Efficient Redundant Configurations for Reliable Servers in Distributed Systems

Modular redundancy and temporal redundancy are traditional techniques to increase system reliability. In addition to being used as temporal redundancy, with technology advancements, slack time can also be used by energy management schemes to save energy. In this paper, we consider the combination of modular and temporal redundancy to achieve energy efficient reliable service provided by multiple servers. We first propose an efficient adaptive parallel recovery scheme that appropriately processes service requests in parallel to increase the number of faults that can be tolerated and thus system reliability. Then we explore schemes to determine the optimal redundant configurationsof the parallel servers to minimize system energy consumption for a given reliability goal or to maximize system reliability for a given energy budget. Our analysis shows that small requests, optimistic approaches, and parallel recovery favor lower levels of modular redundancy, while restricted large requests, pessimistic approaches and serial recovery favor higher levels of modular redundancy.

[1]  Dakai Zhu,et al.  Reliability-Aware Dynamic Energy Management in Dependable Embedded Real-Time Systems , 2006, 12th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS'06).

[2]  Rajesh K. Gupta,et al.  Leakage aware dynamic voltage scaling for real-time embedded systems , 2004, Proceedings. 41st Design Automation Conference, 2004..

[3]  Daniel P. Siewiorek,et al.  Derivation and Calibration of a Transient Error Reliability Model , 1982, IEEE Transactions on Computers.

[4]  Claudio Scordino,et al.  Energy-Efficient Real-Time Heterogeneous Server Clusters , 2006, 12th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS'06).

[5]  Rami G. Melhem,et al.  The interplay of power management and fault recovery in real-time systems , 2004, IEEE Transactions on Computers.

[6]  Thomas D. Burd,et al.  Energy efficient CMOS microprocessor design , 1995, Proceedings of the Twenty-Eighth Annual Hawaii International Conference on System Sciences.

[7]  Rami G. Melhem,et al.  Energy-efficient duplex and TMR real-time systems , 2002, 23rd IEEE Real-Time Systems Symposium, 2002. RTSS 2002..

[8]  Rami G. Melhem,et al.  Energy aware scheduling for distributed real-time systems , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[9]  Kevin Skadron,et al.  Power-aware QoS management in Web servers , 2003, RTSS 2003. 24th IEEE Real-Time Systems Symposium, 2003.

[10]  Alvin R. Lebeck,et al.  Power aware page allocation , 2000, SIGP.

[11]  Wei Zhao,et al.  An energy-efficient slack distribution technique for multimode distributed real-time embedded systems , 2005, IEEE Transactions on Parallel and Distributed Systems.

[12]  Ravishankar K. Iyer,et al.  Measurement and modeling of computer reliability as affected by system activity , 1986, TOCS.

[13]  Carla Schlatter Ellis,et al.  The Synergy Between Power-Aware Memory Systems and Processor Voltage Scaling , 2003, PACS.

[14]  Ying Zhang,et al.  Energy-aware adaptive checkpointing in embedded real-time systems , 2003, 2003 Design, Automation and Test in Europe Conference and Exhibition.

[15]  Niraj K. Jha,et al.  Static and dynamic variable voltage scheduling algorithms for real-time heterogeneous distributed embedded systems , 2002, Proceedings of ASP-DAC/VLSI Design 2002. 7th Asia and South Pacific Design Automation Conference and 15h International Conference on VLSI Design.

[16]  Scott Shenker,et al.  Scheduling for reduced CPU energy , 1994, OSDI '94.

[17]  E. N. Elnozahy,et al.  Energy-Efficient Server Clusters , 2002, PACS.

[18]  Dakai Zhu,et al.  Energy Management for Real-Time Embedded Systems with Reliability Requirements , 2006, 2006 IEEE/ACM International Conference on Computer Aided Design.

[19]  Anantha Chandrakasan,et al.  JouleTrack: a web based tool for software energy profiling , 2001, DAC '01.

[20]  Hiroto Yasuura,et al.  Voltage scheduling problem for dynamically variable voltage processors , 1998, Proceedings. 1998 International Symposium on Low Power Electronics and Design (IEEE Cat. No.98TH8379).

[21]  Eric Rotenberg,et al.  FAST: Frequency-aware static timing analysis , 2006, TECS.

[22]  Michael Kistler,et al.  The case for power management in web servers , 2002 .

[23]  F. Frances Yao,et al.  A scheduling model for reduced CPU energy , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[24]  S. Thompson MOS Scaling: Transistor Challenges for the 21st Century , 1998 .

[25]  Rami G. Melhem,et al.  The effects of energy management on reliability in real-time embedded systems , 2004, IEEE/ACM International Conference on Computer Aided Design, 2004. ICCAD-2004..

[26]  Niraj K. Jha,et al.  Power-conscious joint scheduling of periodic task graphs and aperiodic tasks in distributed real-time embedded systems , 2000, IEEE/ACM International Conference on Computer Aided Design. ICCAD - 2000. IEEE/ACM Digest of Technical Papers (Cat. No.00CH37140).

[27]  Rami G. Melhem,et al.  Analysis of an energy efficient optimistic TMR scheme , 2004, Proceedings. Tenth International Conference on Parallel and Distributed Systems, 2004. ICPADS 2004..

[28]  Karthick Rajamani,et al.  Energy Management for Commercial Servers , 2003, Computer.

[29]  Hagbae Kim,et al.  A Time Redundancy Approach to TMR Failures Using Fault-State Likelihoods , 1994, IEEE Trans. Computers.

[30]  Daniel Mossé,et al.  Energy-efficient policies for embedded clusters , 2005, LCTES '05.

[31]  Ying Zhang,et al.  Task feasibility analysis and dynamic voltage scaling in fault-tolerant real-time embedded systems , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[32]  David Blaauw,et al.  Making typical silicon matter with Razor , 2004, Computer.

[33]  Gang Qu,et al.  Power minimization techniques on distributed real-time systems by global and local slack management , 2005, Proceedings of the ASP-DAC 2005. Asia and South Pacific Design Automation Conference, 2005..

[34]  Israel Koren,et al.  Towards energy-aware software-based fault tolerance in real-time systems , 2002, ISLPED '02.