Energy Efficiency for MapReduce Workloads: An In-depth Study

Energy efficiency has emerged as a crucial optimization goal in data centers. MapReduce has become a popular and even fashionable distributed processing model for parallel computing in data centers. Hadoop is an open-source implementation of MapReduce, which is widely used for short jobs requiring low response time. In this paper, we conduct an in-depth study of the energy efficiency for MapReduce workloads. We identify four factors that affect the energy efficiency of MapReduce. In particular, we make experiments over four typical MapReduce workloads that represent different kinds of application scenarios and measure the energy consumption with varied cluster parameters. Our key finding is that with well-tuned system parameters and adaptive resource configurations, MapReduce cluster can achieve both performance improvement and good energy saving simultaneously in some instances, which is surprisingly contrast to previous works on cluster-level energy conservation.

[1]  Archana Ganapathi,et al.  Statistical Workloads for Energy Efficient MapReduce , 2010 .

[2]  Amin Vahdat,et al.  Managing Energy and Server Resources for a Hosting Center , 2001, SOSP 2001.

[3]  Christoforos E. Kozyrakis,et al.  JouleSort: a balanced energy-efficiency benchmark , 2007, SIGMOD '07.

[4]  Rui Zhang,et al.  Proceedings of the Twenty-Third Australasian Database Conference - Volume 124 , 2012 .

[5]  PoessMeikel,et al.  Energy cost, the key challenge of today's data centers , 2008, VLDB 2008.

[6]  Christoforos E. Kozyrakis,et al.  On the energy (in)efficiency of Hadoop clusters , 2010, OPSR.

[7]  Jignesh M. Patel,et al.  Towards Eco-friendly Database Management Systems , 2009, CIDR.

[8]  Richard E. Brown,et al.  Report to Congress on Server and Data Center Energy Efficiency: Public Law 109-431 , 2008 .

[9]  Jignesh M. Patel,et al.  Wimpy node clusters: what about non-wimpy workloads? , 2010, DaMoN '10.

[10]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[11]  Karthick Rajamani,et al.  On evaluating request-distribution schemes for saving energy in server clusters , 2003, 2003 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS 2003..

[12]  Mehul A. Shah,et al.  Analyzing the energy efficiency of a database server , 2010, SIGMOD Conference.

[13]  Thomas F. Wenisch,et al.  PowerNap: eliminating server idle power , 2009, ASPLOS.

[14]  J. Koomey Worldwide electricity used in data centers , 2008 .

[15]  Amar Phanishayee,et al.  FAWNdamentally Power-efficient Clusters , 2009, HotOS.

[16]  Luiz André Barroso,et al.  The Case for Energy-Proportional Computing , 2007, Computer.

[17]  Raghunath Othayoth Nambiar,et al.  Energy cost, the key challenge of today's data centers: a power consumption analysis of TPC-C results , 2008, Proc. VLDB Endow..

[18]  Amin Vahdat,et al.  Managing energy and server resources in hosting centers , 2001, SOSP.

[19]  S ChaseJeffrey,et al.  Managing energy and server resources in hosting centers , 2001 .

[20]  E. N. Elnozahy,et al.  Energy-Efficient Server Clusters , 2002, PACS.

[21]  Jignesh M. Patel,et al.  Energy management for MapReduce clusters , 2010, Proc. VLDB Endow..

[22]  Xiaorui Wang,et al.  Exploring power-performance tradeoffs in database systems , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[23]  Archana Ganapathi,et al.  To compress or not to compress - compute vs. IO tradeoffs for mapreduce energy efficiency , 2010, Green Networking '10.