Energy Ecient Redundant Configurations for Reliable Parallel Servers

Modular redundancy and temporal redundancy are traditional techniques to increase system reliability. In addition to being used as temporal redundancy, with technology advancements, slack time can also be used by energy management schemes to save energy. In this paper, we consider the combination of modular and temporal redundancy for reliable service provided by multiple servers. We first propose an ecient adaptive parallel recovery scheme that appropriately processes service requests in parallel to increase the number of faults that can be tolerated and thus system reliability. Then we explore schemes to determine the optimal redundant configurations of the parallel servers to minimize system energy consumption for a given reliability goal or to maximize system reliability for a given energy budget. Our analysis shows that parallel recovery, small requests and optimistic approaches favor lower levels of modular redundancy, while restricted serial recovery, large requests and pessimistic approaches favor higher levels of modular redundancy.

[1]  Eric Rotenberg,et al.  FAST: Frequency-aware static timing analysis , 2006, TECS.

[2]  Rami G. Melhem,et al.  Energy-efficient duplex and TMR real-time systems , 2002, 23rd IEEE Real-Time Systems Symposium, 2002. RTSS 2002..

[3]  Hagbae Kim,et al.  A Time Redundancy Approach to TMR Failures Using Fault-State Likelihoods , 1994, IEEE Trans. Computers.

[4]  Daniel Mossé,et al.  Energy-efficient policies for embedded clusters , 2005, LCTES '05.

[5]  Ying Zhang,et al.  Task feasibility analysis and dynamic voltage scaling in fault-tolerant real-time embedded systems , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[6]  Rami G. Melhem,et al.  Analysis of an energy efficient optimistic TMR scheme , 2004, Proceedings. Tenth International Conference on Parallel and Distributed Systems, 2004. ICPADS 2004..

[7]  Karthick Rajamani,et al.  Energy Management for Commercial Servers , 2003, Computer.

[8]  Daniel P. Siewiorek,et al.  Derivation and Calibration of a Transient Error Reliability Model , 1982, IEEE Transactions on Computers.

[9]  Rajesh K. Gupta,et al.  Leakage aware dynamic voltage scaling for real-time embedded systems , 2004, Proceedings. 41st Design Automation Conference, 2004..

[10]  Kevin Skadron,et al.  Power-aware QoS management in Web servers , 2003, RTSS 2003. 24th IEEE Real-Time Systems Symposium, 2003.

[11]  Rami G. Melhem,et al.  The effects of energy management on reliability in real-time embedded systems , 2004, IEEE/ACM International Conference on Computer Aided Design, 2004. ICCAD-2004..

[12]  Ying Zhang,et al.  Energy-aware adaptive checkpointing in embedded real-time systems , 2003, 2003 Design, Automation and Test in Europe Conference and Exhibition.

[13]  Hiroto Yasuura,et al.  Voltage scheduling problem for dynamically variable voltage processors , 1998, Proceedings. 1998 International Symposium on Low Power Electronics and Design (IEEE Cat. No.98TH8379).

[14]  Michael Kistler,et al.  The case for power management in web servers , 2002 .

[15]  F. Frances Yao,et al.  A scheduling model for reduced CPU energy , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[16]  Alvin R. Lebeck,et al.  Power aware page allocation , 2000, SIGP.

[17]  Ravishankar K. Iyer,et al.  Measurement and modeling of computer reliability as affected by system activity , 1986, TOCS.

[18]  Scott Shenker,et al.  Scheduling for reduced CPU energy , 1994, OSDI '94.

[19]  E. N. Elnozahy,et al.  Energy-Efficient Server Clusters , 2002, PACS.

[20]  Anantha Chandrakasan,et al.  JouleTrack: a web based tool for software energy profiling , 2001, DAC '01.

[21]  Rami G. Melhem,et al.  The interplay of power management and fault recovery in real-time systems , 2004, IEEE Transactions on Computers.

[22]  Israel Koren,et al.  Towards energy-aware software-based fault tolerance in real-time systems , 2002, ISLPED '02.

[23]  S. Thompson MOS Scaling: Transistor Challenges for the 21st Century , 1998 .

[24]  Thomas D. Burd,et al.  Energy efficient CMOS microprocessor design , 1995, Proceedings of the Twenty-Eighth Annual Hawaii International Conference on System Sciences.

[25]  Carla Schlatter Ellis,et al.  The Synergy Between Power-Aware Memory Systems and Processor Voltage Scaling , 2003, PACS.