Energy efficiency of data centers has attracted wide research attention with growing concern for power consumption and heat dissipation. Map Reduce as an efficient programming model for data-intensive computing is increasingly popular among industrial companies and academic organizations. As Map Reduce is developed specifically to process large-scale data analysis, its impact on energy efficiency of data centers has not been well scrutinized. Recently some energy conserving strategies have been proposed to reduce the overall power consumption of Map Reduce clusters. The fundamental ideas of previous work can be summarized as scaling down working nodes and reducing execution time. However, there are few researches on energy prediction for Map Reduce workloads, which can offer guide for cluster administrator to make power budget or schedule workloads to clusters with different power budget, and be useful for monitoring workloads' energy consumption. In this paper, we identify several workload metrics that have strong correlations with energy consumption. We use multivariate linear regression to analyze these metrics, and then construct a prediction model. Regression diagnosis is performed intensively to optimize the prediction model. After applying to the Word Count and Sort workloads with various input size, we find our prediction model is highly accurate with 0.12% and 0.15% inaccuracy compared to the observed energy consumption in the best and worst cases.
[1]
Christoforos E. Kozyrakis,et al.
On the energy (in)efficiency of Hadoop clusters
,
2010,
OPSR.
[2]
Archana Ganapathi,et al.
Statistics-driven workload modeling for the Cloud
,
2010,
2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010).
[3]
Sanjay Ghemawat,et al.
MapReduce: Simplified Data Processing on Large Clusters
,
2004,
OSDI.
[4]
Hairong Kuang,et al.
The Hadoop Distributed File System
,
2010,
2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).
[5]
Archana Ganapathi,et al.
To compress or not to compress - compute vs. IO tradeoffs for mapreduce energy efficiency
,
2010,
Green Networking '10.
[6]
Yanpei Chen,et al.
Towards Energy Efficient MapReduce
,
2009
.
[7]
Jignesh M. Patel,et al.
Energy management for MapReduce clusters
,
2010,
Proc. VLDB Endow..