Towards Thermal-Efficient Hadoop Clusters through Scheduling

In this study, we proposed a new resource-and thermal-aware scheduler in Hadoop clusters, our scheduler aims at minimizing peak inlet temperature across all nodes to reduce power consumption and cooling cost in data centers. The proposed dynamic scheduler makes job scheduling decisions based on current CPU/disk utilization and number of tasks running as well as the feedback given by all slave nodes at run-time. We deploy a thermal model to project respective temperature of each slave node in addition to neighbor's heat contribution. The thermal-aware scheduler is integrated with the Hadoop's scheduling mechanism. We test our schedulers by running a set of Hadoop benchmarks (e.g., WordCount, DistributedGrep, PI and TeraSort) under various temperature conditions, utilization thresholds, and cluster sizes. The experimental results show that our scheduler achieves an average inlet temperature reduction by 2.5C over the default FIFO scheduler, our scheduling solution saves approximately 15% of cooling cost with marginal performance degradation.

[1]  Dzmitry Kliazovich,et al.  DENS: Data Center Energy-Efficient Network-Aware Scheduling , 2010, GreenCom/CPSCom.

[2]  Cullen E. Bash,et al.  Thermal considerations in cooling large scale high compute density data centers , 2002, ITherm 2002. Eighth Intersociety Conference on Thermal and Thermomechanical Phenomena in Electronic Systems (Cat. No.02CH37258).

[3]  Cullen E. Bash,et al.  Computational Fluid Dynamics Modeling of High Compute Density Data Centers to Assure System Inlet Air Specifications , 2001 .

[4]  Jeffrey S. Chase,et al.  Balance of power: dynamic thermal management for Internet data centers , 2005, IEEE Internet Computing.

[5]  Gerard F. Jones,et al.  A review of data center cooling technology, operating conditions and the corresponding low-grade waste heat recovery opportunities , 2014 .

[6]  Yuguang Fang,et al.  Energy and Network Aware Workload Management for Sustainable Data Centers with Thermal Storage , 2014, IEEE Transactions on Parallel and Distributed Systems.

[7]  Sandeep K. S. Gupta,et al.  TACOMA: Server and workload management in internet data centers considering cooling-computing power trade-off and energy proportionality , 2012, TACO.

[8]  Sandeep K. S. Gupta,et al.  Energy-Efficient Thermal-Aware Task Scheduling for Homogeneous High-Performance Computing Data Centers: A Cyber-Physical Approach , 2008, IEEE Transactions on Parallel and Distributed Systems.

[9]  Paolo Cremonesi,et al.  Cooling-aware workload placement with performance constraints , 2011, Perform. Evaluation.

[10]  Junaid Shuja,et al.  Data center energy efficient resource scheduling , 2014, Cluster Computing.

[11]  Ahmad Khonsari,et al.  Cooling aware job migration for reducing cost in cloud environment , 2014, The Journal of Supercomputing.

[12]  Ying Li,et al.  A Power-Aware Scheduling of MapReduce Applications in the Cloud , 2011, 2011 IEEE Ninth International Conference on Dependable, Autonomic and Secure Computing.

[13]  S. Gupta,et al.  Thermal-aware task scheduling for data centers through minimizing heat recirculation , 2007, 2007 IEEE International Conference on Cluster Computing.

[14]  Rini T. Kaushik,et al.  GreenHDFS: towards an energy-conserving, storage-efficient, hybrid Hadoop compute cluster , 2010 .