CPU Frequency Tuning to Improve Energy Efficiency of MapReduce Systems

Energy efficiency is a major concern in today's data centers that house large scale distributed processing systems such as data parallel MapReduce clusters. Modern power aware systems utilize the dynamic voltage and frequency scaling mechanism available in processors to manage the energy consumption. In this paper, we initially characterize the energy efficiency of MapReduce jobs with respect to built-in power governors. Our analysis indicates that while a built-in power governor provides the best energy efficiency for a job that is CPU as well as IO intensive, a common CPU-frequency across the cluster provides best the energy efficiency for other types of jobs. In order to identify this optimal frequency setting, we derive energy and performance models for MapReduce jobs on a HPC cluster and validate these models experimentally on different platforms. We demonstrate how these models can be used to improve energy efficiency of the machine learning MapReduce applications running on the Yarn platform. The execution of jobs at their optimal frequencies improves the energy efficiency by average 25% over the default governor setting. In case of mixed workloads, the energy efficiency improves by up to 10% when we use an optimal CPU-frequency across the cluster.

[1]  R. Suleiman DYNAMIC VOLTAGE FREQUENCY SCALING ( DVFS ) FOR MICROPROCESSORS POWER AND ENERGY REDUCTION Diary , 2005 .

[2]  R. Stephenson A and V , 1962, The British journal of ophthalmology.

[3]  Klara Nahrstedt,et al.  Predictive data and energy management in GreenHDFS , 2011, 2011 International Green Computing Conference and Workshops.

[4]  Yanpei Chen,et al.  Energy efficiency for large-scale MapReduce workloads with significant interactive analysis , 2012, EuroSys '12.

[5]  Ying Li,et al.  A Power-Aware Scheduling of MapReduce Applications in the Cloud , 2011, 2011 IEEE Ninth International Conference on Dependable, Autonomic and Secure Computing.

[6]  Depei Qian,et al.  Energy Prediction for MapReduce Workloads , 2011, 2011 IEEE Ninth International Conference on Dependable, Autonomic and Secure Computing.

[7]  Jianling Sun,et al.  An analytical performance model of MapReduce , 2011, 2011 IEEE International Conference on Cloud Computing and Intelligence Systems.

[8]  Albert Y. Zomaya,et al.  Multiple Frequency Selection in DVFS-Enabled Processors to Minimize Energy Consumption , 2012, ArXiv.

[9]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[10]  Kevin Wilkinson,et al.  Analytical Performance Models for MapReduce Workloads , 2012, International Journal of Parallel Programming.

[11]  Umesh Bellur,et al.  An Empirical Study of Hadoop's Energy Efficiency on a HPC Cluster , 2014, ICCS.

[12]  Thomas P. Ryan,et al.  Modern Regression Methods , 1996 .

[13]  Scott Shenker,et al.  Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling , 2010, EuroSys '10.

[14]  Nan Yang,et al.  Energy Efficiency for MapReduce Workloads: An In-depth Study , 2012, ADC.

[15]  Rong Ge,et al.  Improving MapReduce energy efficiency for computation intensive workloads , 2011, 2011 International Green Computing Conference and Workshops.

[16]  Thu D. Nguyen,et al.  Reducing electricity cost through virtual machine placement in high performance computing clouds , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[17]  Christoforos E. Kozyrakis,et al.  On the energy (in)efficiency of Hadoop clusters , 2010, OPSR.

[18]  Madhusudhan Govindaraju,et al.  MapReduce framework energy adaptation via temperature awareness , 2013, Cluster Computing.

[19]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[20]  Shen Li,et al.  TAPA: Temperature aware power allocation in data center with Map-Reduce , 2011, 2011 International Green Computing Conference and Workshops.

[21]  Vasudeva Varma,et al.  Dynamic energy efficient data placement and cluster reconfiguration algorithm for MapReduce framework , 2012, Future Gener. Comput. Syst..

[22]  Jignesh M. Patel,et al.  Energy management for MapReduce clusters , 2010, Proc. VLDB Endow..

[23]  Carlo Curino,et al.  Apache Hadoop YARN: yet another resource negotiator , 2013, SoCC.

[24]  Geoffrey I. Webb Naïve Bayes , 2020, Encyclopedia of Machine Learning.

[25]  Evripidis Bampis,et al.  Energy Efficient Scheduling of MapReduce Jobs , 2014, Euro-Par.

[26]  Kushal Datta,et al.  Energy efficient scheduling of MapReduce workloads on heterogeneous clusters , 2011, GCM '11.

[27]  Samuel Kounev,et al.  I/O Performance Modeling of Virtualized Storage Systems , 2013, 2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems.

[28]  M. Horowitz,et al.  Low-power digital design , 1994, Proceedings of 1994 IEEE Symposium on Low Power Electronics.