Dynamic energy efficient data placement and cluster reconfiguration algorithm for MapReduce framework

With the recent emergence of cloud computing based services on the Internet, MapReduce and distributed file systems like HDFS have emerged as the paradigm of choice for developing large scale data intensive applications. Given the scale at which these applications are deployed, minimizing power consumption of these clusters can significantly cut down operational costs and reduce their carbon footprint-thereby increasing the utility from a provider's point of view. This paper addresses energy conservation for clusters of nodes that run MapReduce jobs. The algorithm dynamically reconfigures the cluster based on the current workload and turns cluster nodes on or off when the average cluster utilization rises above or falls below administrator specified thresholds, respectively. We evaluate our algorithm using the GridSim toolkit and our results show that the proposed algorithm achieves an energy reduction of 33% under average workloads and up to 54% under low workloads.

[1]  Ricardo Bianchini,et al.  Conserving disk energy in network servers , 2003, ICS '03.

[2]  Darrell D. E. Long,et al.  Adaptive disk spin‐down for mobile computers , 2000, Mob. Networks Appl..

[3]  Vincent Salzgeber,et al.  Making cluster applications energy-aware , 2009, ACDC '09.

[4]  Hai Jin,et al.  Magnet: A novel scheduling policy for power reduction in cluster with virtual machines , 2008, 2008 IEEE International Conference on Cluster Computing.

[5]  Erol Gelenbe,et al.  Energy-Efficient Cloud Computing , 2010, Comput. J..

[6]  BianchiniRicardo,et al.  Power and Energy Management for Server Systems , 2004 .

[7]  D E LongDarrell,et al.  Adaptive disk spindown for mobile computers , 2000 .

[8]  Bruno Schulze,et al.  Proceedings of the 7th International Workshop on Middleware for Grids, Clouds and e-Science , 2009, Middleware 2009.

[9]  Luiz André Barroso,et al.  The Case for Energy-Proportional Computing , 2007, Computer.

[10]  Rajkumar Buyya,et al.  Power-aware provisioning of Cloud resources for real-time services , 2009, MGC '09.

[11]  Y. N. Srikant,et al.  Compiler-directed frequency and voltage scaling for a multiple clock domain microarchitecture , 2008, CF '08.

[12]  Xi He,et al.  Power-aware scheduling of virtual machines in DVFS-enabled clusters , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[13]  Liang Liu,et al.  GreenCloud: a new architecture for green data center , 2009, ICAC-INDST '09.

[14]  Mahmut T. Kandemir,et al.  DRPM: dynamic speed control for power management in server class disks , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..

[15]  E. N. Elnozahy,et al.  Energy Conservation Policies for Web Servers , 2003, USENIX Symposium on Internet Technologies and Systems.

[16]  Rajkumar Buyya,et al.  GridSim: a toolkit for the modeling and simulation of distributed resource management and scheduling for Grid computing , 2002, Concurr. Comput. Pract. Exp..

[17]  Christoforos E. Kozyrakis,et al.  On the energy (in)efficiency of Hadoop clusters , 2010, OPSR.

[18]  Wu-chun Feng,et al.  A Power-Aware Run-Time System for High-Performance Computing , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[19]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[20]  María S. Pérez-Hernández,et al.  A new formalism for dynamic reconfiguration of data servers in a cluster , 2005, J. Parallel Distributed Comput..

[21]  Jeffrey S. Chase,et al.  Proceedings of the 1st workshop on Automated control for datacenters and clouds , 2009, ICAC 2009.

[22]  Scott Shenker,et al.  Scheduling for reduced CPU energy , 1994, OSDI '94.

[23]  Hamid Sarbazi-Azad,et al.  Design and performance of networks for super-, cluster-, and grid-computing: Part I , 2005, J. Parallel Distributed Comput..

[24]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[25]  Rajkumar Buyya,et al.  Energy-Efficient Management of Data Center Resources for Cloud Computing: A Vision, Architectural Elements, and Open Challenges , 2010, PDPTA.

[26]  Wolf-Dietrich Weber,et al.  Power provisioning for a warehouse-sized computer , 2007, ISCA '07.

[27]  Enrique Castro-Leon,et al.  Power-Aware Management in Cloud Data Centers , 2009, CloudCom.

[28]  Rajkumar Buyya,et al.  Article in Press Future Generation Computer Systems ( ) – Future Generation Computer Systems Cloud Computing and Emerging It Platforms: Vision, Hype, and Reality for Delivering Computing as the 5th Utility , 2022 .

[29]  Alvin R. Lebeck,et al.  Power aware page allocation , 2000, SIGP.

[30]  Enrique V. Carrera,et al.  Load balancing and unbalancing for power and performance in cluster-based systems , 2001 .

[31]  Albert Y. Zomaya,et al.  Minimizing Energy Consumption for Precedence-Constrained Applications Using Dynamic Voltage Scaling , 2009, 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid.

[32]  Vasudeva Varma,et al.  Learning based opportunistic admission control algorithm for MapReduce as a service , 2010, ISEC.

[33]  Howard Gobioff,et al.  The Google file system , 2003, SOSP '03.

[34]  Rajkumar Buyya,et al.  Power Aware Scheduling of Bag-of-Tasks Applications with Deadline Constraints on DVS-enabled Clusters , 2007, Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07).

[35]  Yasushi Inoguchi,et al.  Performance evaluation of a Green Scheduling Algorithm for energy savings in Cloud computing , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).