Energy efficient scheduling of MapReduce workloads on heterogeneous clusters

Energy efficiency has become the center of attention in emerging data center infrastructures as increasing energy costs continue to outgrow all other operating expenditures. In this work we investigate energy aware scheduling heuristics to increase the energy efficiency of MapReduce workloads on heterogeneous Hadoop clusters comprising both low power (wimpy) and high performance (brawny) nodes. We first make a case for heterogeneity by showing that low power Intel Atom processors and high performance Intel Sandy Bridge processors are more energy efficient for I/O bound workloads and CPU bound workloads, respectively. Then we present several energy efficient scheduling heuristics that exploit this heterogeneity and real-time power measurements enabled by modern processor architectures. Through experiments on a 23-node heterogeneous Hadoop cluster we demonstrate up to 27% better energy efficiency with our heuristics compared with the default Hadoop scheduler.

[1]  Feng Zhao,et al.  Energy aware consolidation for cloud computing , 2008, CLUSTER 2008.

[2]  Christoforos E. Kozyrakis,et al.  On the energy (in)efficiency of Hadoop clusters , 2010, OPSR.

[3]  Randy H. Katz,et al.  Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.

[4]  Urs Hölzle,et al.  Brawny cores still beat wimpy cores, most of the time , 2010 .

[5]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[6]  Randy H. Katz,et al.  An energy case for hybrid datacenters , 2010, OPSR.

[7]  Thomas F. Wenisch,et al.  PowerNap: eliminating server idle power , 2009, ASPLOS.

[8]  Rini T. Kaushik,et al.  GreenHDFS: towards an energy-conserving, storage-efficient, hybrid Hadoop compute cluster , 2010 .

[9]  Jie Huang,et al.  The HiBench benchmark suite: Characterization of the MapReduce-based data analysis , 2010, 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010).

[10]  Akshat Verma,et al.  pMapper: Power and Migration Cost Aware Application Placement in Virtualized Systems , 2008, Middleware.

[11]  Scott Shenker,et al.  Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling , 2010, EuroSys '10.

[12]  Aman Kansal,et al.  Energy Efficient Data Intensive Distributed Computing , 2011, IEEE Data Eng. Bull..

[13]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[14]  Jignesh M. Patel,et al.  Wimpy node clusters: what about non-wimpy workloads? , 2010, DaMoN '10.

[15]  Jignesh M. Patel,et al.  Energy management for MapReduce clusters , 2010, Proc. VLDB Endow..

[16]  Amar Phanishayee,et al.  FAWN: a fast array of wimpy nodes , 2009, SOSP '09.