Proactive power and thermal aware optimizations for energy-efficient cloud computing

Cloud computing addresses the problem of costly computing infrastructures by providing elasticity to the dynamic resource provisioning on a pay-as-you-go basis, and nowadays it is considered as a valid alternative to owned high performance computing clusters. There are two main appealing incentives for this emerging paradigm: first, utility-based usage models provided by Clouds allow clients to pay per use, increasing the user satisfaction; then, there is only a relatively low investment required for the remote devices that access the Cloud resources. Computational demand on data centers is increasing due to growing popularity of Cloud applications. However, these facilities are becoming unsustainable in terms of power consumption and growing energy costs. Nowadays, the data center industry consumes about 2% of the worldwide energy production. Also, the proliferation of urban data centers is responsible for the increasing power demand of up to 70% in metropolitan areas where the power density is becoming too high for the power grid. In two or three years, this situation will cause outages in the 95% of urban data centers incurring in annual costs of about US$2 million per infrastructure. Besides the economical impact, the heat and the carbon footprint generated by cooling systems in data centers are dramatically increasing and they are expected to overtake airline industry emissions by 2020. The Cloud model is helping to mitigate this issue, reducing carbon footprint per executed task and diminishing CO2 emissions, by increasing data centers overall utilization. According to the Schneider Electric‘s report on virtualization and Cloud computing efficiency, Cloud computing offers around 17% reduction in energy consumption by sharing computing resources among all users. However, Cloud providers need to implement an energy-efficient management of physical resources to meet the growing demand of their services while ensuring sustainability. The main sources of energy consumption in data centers are due to computational Information Technology (IT) and cooling infrastructures. IT represents around 60% of the total consumption, where the static power dissipation of idle servers is the dominant contribution. On the other hand, the cooling infrastructure originates around 30% of the overall consumption to ensure the reliability of the computational infrastructure. The key factor that affects cooling requirements is the maximum temperature reached on the servers due to their activity, depending on both room temperature and workload allocation. Static consumption of servers represents about 70% of the IT power. This issue is intensified by the exponential influence of temperature on the leakage currents. Leakage power is a component of the total power consumption in data centers that is not traditionally considered in the set point temperature of the room. However, the effect of this power contribution, increased with temperature, can determine the savings associated with the proactive management of the cooling system. One of the major challenges to understand the thermal influence on static energy at the data center scope consists in the description of the trade-offs between leakage and cooling consumption. The Cloud model is helping to reduce the static consumption from two perspectives based on virtual machine allocation and consolidation. First, power-aware policies reduce the static consumption by increasing overall utilization, so the operating server set can be reduced. Dynamic Voltage and Frequency Scaling (DVFS) is applied for power capping, lowering servers’ energy consumption. Then, thermal-aware strategies help to reduce hot spots in the IT infrastructure by spreading the workload, so the set point room temperature can be increased resulting in cooling savings. Both thermal and power approaches have the potential to improve energy efficiency in Cloud facilities. Unfortunately, these policies are not jointly applied due to the lack of models that include parameters from both power and thermal approaches. Deriving fast and accurate power models that incorporate these characteristics, targeting high-end servers, would allow us to combine power and temperature together in an energy efficient management. Furthermore, as Cloud applications expect services to be delivered as per Service Level Agreement (SLA), power consumption in data centers has to be minimized while meeting this requirement whenever it is feasible. Also, as opposed to high performance computing, Cloud workloads vary significantly over time, making optimal allocation and DVFS configuration not a trivial task. A major challenge to guarantee the Quality of Service (QoS) for Cloud applications consists in analyzing the trade-offs between consolidation and performance that help to combine DVFS with power and thermal strategies. The main objective of this Ph.D. thesis is to address the energy challenge in Cloud data centers from a thermal and power-aware perspective using proactive strategies. Our research proposes the design and implementation of models and global optimizations that jointly consider energy consumption of both computing and cooling resources while maintaining QoS from a new holistic perspective. Thesis Contributions: To support the thesis that our research can deliver significant value in the area of Cloud energy-efficiency, compared to traditional approaches, we have: 1) Defined a taxonomy on energy efficiency that compiles the different levels of abstraction that can be found in data centers area. 2) Classified state-of-the-art approaches according to the proposed taxonomy, identifying new open challenges from a holistic perspective. 3) Identified the trade-offs between leakage and cooling consumption based on empirical research. 4) Proposed novel modeling techniques for the automatic identification of fast and accurate models, providing testing in a real environment. 5) Analyzed DVFS, performance and power trade-offs in the Cloud environment. 6) Designed and implemented a novel proactive optimization policy for dynamic consolidation of virtual machines that combine DVFS and power-aware strategies while ensuring QoS. 7) Derived thermal models for CPU and memory devices validated in real environment. 8) Designed and implemented new proactive approaches that include DVFS, thermal and power considerations in both cooling and IT consumption from a novel holistic perspective. 9) Validated our optimization strategies in simulation environment.