An Empirical Study of Hadoop's Energy Efficiency on a HPC Cluster

Abstract Map-Reduce programming model is commonly used for efficient scientific computations, as it executes tasks in parallel and distributed manner on large data volumes. The HPC infrastructure can effectively increase the parallelism of map-reduce tasks. However such an execution will incur high energy and data transmission costs. Here we empirically study how the energy efficiency of a map-reduce job varies with increase in parallelism and network bandwidth on a HPC cluster. We also investigate the effectiveness of power-aware systems in managing the energy consumption of different types of map-reduce jobs. We comprehend that for some jobs the energy efficiency degrades at high degree of parallelism, and for some it improves at low CPU frequency. Consequently we suggest strategies for configuring the degree of parallelism, network bandwidth and power management features in a HPC cluster for energy efficient execution of map-reduce jobs.

[1]  Yanpei Chen,et al.  Energy efficiency for large-scale MapReduce workloads with significant interactive analysis , 2012, EuroSys '12.

[2]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[3]  Rong Ge,et al.  Improving MapReduce energy efficiency for computation intensive workloads , 2011, 2011 International Green Computing Conference and Workshops.

[4]  Thu D. Nguyen,et al.  Reducing electricity cost through virtual machine placement in high performance computing clouds , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[5]  Christoforos E. Kozyrakis,et al.  On the energy (in)efficiency of Hadoop clusters , 2010, OPSR.

[6]  Alain J. Martin Towards an energy complexity of computation , 2001, Inf. Process. Lett..

[7]  Kushal Datta,et al.  Energy efficient scheduling of MapReduce workloads on heterogeneous clusters , 2011, GCM '11.

[8]  Luiz André Barroso,et al.  The Case for Energy-Proportional Computing , 2007, Computer.

[9]  Nan Yang,et al.  Energy Efficiency for MapReduce Workloads: An In-depth Study , 2012, ADC.

[10]  Wu-chun Feng,et al.  The Green500 List: Encouraging Sustainable Supercomputing , 2007, Computer.

[11]  Rini T. Kaushik,et al.  GreenHDFS: towards an energy-conserving, storage-efficient, hybrid Hadoop compute cluster , 2010 .

[12]  Jie Huang,et al.  The HiBench benchmark suite: Characterization of the MapReduce-based data analysis , 2010, 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010).

[13]  Vincent Salzgeber,et al.  Making cluster applications energy-aware , 2009, ACDC '09.

[14]  Christoforos E. Kozyrakis,et al.  Evaluating MapReduce for Multi-core and Multiprocessor Systems , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[15]  Vasudeva Varma,et al.  Dynamic energy efficient data placement and cluster reconfiguration algorithm for MapReduce framework , 2012, Future Gener. Comput. Syst..

[16]  Jignesh M. Patel,et al.  Energy management for MapReduce clusters , 2010, Proc. VLDB Endow..

[17]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[18]  Archana Ganapathi,et al.  To compress or not to compress - compute vs. IO tradeoffs for mapreduce energy efficiency , 2010, Green Networking '10.

[19]  Rong Ge,et al.  Improvement of power-performance efficiency for high-end computing , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[20]  Depei Qian,et al.  Energy Prediction for MapReduce Workloads , 2011, 2011 IEEE Ninth International Conference on Dependable, Autonomic and Secure Computing.