A Technique for Self-Optimizing Scalable and Dependable Server Clusters under QoS Constraints

Cluster management has become a multi objective task that involves many disciplines like power optimization, fault tolerance, dependability and online system operation analysis. Efficient and secure operation of these clusters is a key objective of any data center policy. In addition, the service provided by these servers must fulfill a level of quality of service (QoS) to the customers. Applying self-management techniques to these clusters would simplify and automate its operation. Current self-management techniques that take into account service level agreements (SLAs) do not cover at the same time the two most important sides of the cluster operation: self-optimization, for an efficient and profitable operation, and self-healing, for a secure operation and high level of quality of service perceived by users. This work integrates a self-optimization strategy for Internet server clusters that optimizes the power consumption, using dynamic provisioning of servers, with a self-healing strategy that improves the reaction of the cluster to a server failure, by using the spare capacity of the cluster intelligently. The self-management technique is based on empirical response time and power consumption models of the servers that simplify its operation. Additionally, the technique presented in this paper guarantees the fulfillment of the SLA.

[1]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[2]  Daniel Mossé,et al.  Statistical QoS Guarantee and Energy-Efficiency in Web Server Clusters , 2007, 19th Euromicro Conference on Real-Time Systems (ECRTS'07).

[3]  Ricardo Bianchini,et al.  C-Oracle: Predictive thermal management for data centers , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[4]  Lothar Thiele,et al.  Power management schemes for heterogeneous clusters under quality of service requirements , 2011, SAC.

[5]  James Aweya,et al.  An adaptive load balancing scheme for web servers , 2002, Int. J. Netw. Manag..

[6]  Schahram Dustdar,et al.  A survey on self-healing systems: approaches and systems , 2010, Computing.

[7]  Saurabh Bagchi,et al.  Chameleon: a software infrastructure for adaptive fault tolerance , 1998, Proceedings. IEEE International Computer Performance and Dependability Symposium. IPDS'98 (Cat. No.98TB100248).

[8]  Rajkumar Buyya,et al.  Power-aware provisioning of Cloud resources for real-time services , 2009, MGC '09.

[9]  Enrique V. Carrera,et al.  Load balancing and unbalancing for power and performance in cluster-based systems , 2001 .

[10]  Ravishankar K. Iyer,et al.  Chameleon: A Software Infrastructure for Adaptive Fault Tolerance , 1999, IEEE Trans. Parallel Distributed Syst..

[11]  Luca Benini,et al.  Quantitative comparison of power management algorithms , 2000, Proceedings Design, Automation and Test in Europe Conference and Exhibition 2000 (Cat. No. PR00537).

[12]  Ludmila Cherkasova,et al.  Session-Based Admission Control: A Mechanism for Peak Load Management of Commercial Web Sites , 2002, IEEE Trans. Computers.

[13]  Xiaoyun Zhu,et al.  PARTIC: Power-Aware Response Time Control for Virtualized Web Servers , 2011, IEEE Transactions on Parallel and Distributed Systems.

[14]  Yefu Wang,et al.  Coordinating Power Control and Performance Management for Virtualized Server Clusters , 2011, IEEE Transactions on Parallel and Distributed Systems.

[15]  Claudio Scordino,et al.  Energy-Efficient Real-Time Heterogeneous Server Clusters , 2006, 12th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS'06).

[16]  GhemawatSanjay,et al.  The Google file system , 2003 .