Exploiting Spatio-Temporal Tradeoffs for Energy-Aware MapReduce in the Cloud

MapReduce is a distributed computing paradigm widely used for building large-scale data processing applications. When used in cloud environments, MapReduce clusters are dynamically created using virtual machines (VMs) and managed by the cloud provider. In this paper, we study the energy efficiency problem for such MapReduce clouds. We describe a unique spatio-temporal tradeoff that includes efficient spatial fitting of VMs on servers to achieve high utilization of machine resources, as well as balanced temporal fitting of servers with VMs having similar runtimes to ensure a server runs at a high utilization throughout its uptime. We propose VM placement algorithms that explicitly incorporate these tradeoffs. Further, we propose techniques that dynamically scale MapReduce clusters to further improve energy consumption while ensuring that jobs meet or improve their expected runtimes. Our algorithms achieve energy savings over existing placement techniques, and an additional optimization technique further achieves savings while simultaneously improving job performance.

[1]  Aameek Singh,et al.  Server-storage virtualization: Integration and load balancing in data centers , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[2]  Albert G. Greenberg,et al.  Reining in the Outliers in Map-Reduce Clusters using Mantri , 2010, OSDI.

[3]  Thomas Sandholm,et al.  MapReduce optimization using regulated dynamic prioritization , 2009, SIGMETRICS '09.

[4]  Shivnath Babu,et al.  Towards automatic optimization of MapReduce programs , 2010, SoCC '10.

[5]  Abhishek Chandra,et al.  STEAMEngine: Driving MapReduce provisioning in the cloud , 2011, 2011 18th International Conference on High Performance Computing.

[6]  Roy H. Campbell,et al.  Resource Provisioning Framework for MapReduce Jobs with Performance Goals , 2011, Middleware.

[7]  Himabindu Pucha,et al.  Towards Optimizing Hadoop Provisioning in the Cloud , 2009, HotCloud.

[8]  J. Koomey Worldwide electricity used in data centers , 2008 .

[9]  Jordi Torres,et al.  Resource-Aware Adaptive Scheduling for MapReduce Clusters , 2011, Middleware.

[10]  Trevor N. Mudge,et al.  Understanding and Designing New Server Architectures for Emerging Warehouse-Computing Environments , 2008, 2008 International Symposium on Computer Architecture.

[11]  Randy H. Katz,et al.  An energy case for hybrid datacenters , 2010, OPSR.

[12]  Ricardo Bianchini,et al.  Power and energy management for server systems , 2004, Computer.

[13]  Amin Vahdat,et al.  Managing energy and server resources in hosting centers , 2001, SOSP.

[14]  R.W. Brodersen,et al.  A dynamic voltage scaled microprocessor system , 2000, IEEE Journal of Solid-State Circuits.

[15]  Luiz André Barroso,et al.  The Case for Energy-Proportional Computing , 2007, Computer.

[16]  Gautam Kar,et al.  Application Performance Management in Virtualized Server Environments , 2006, 2006 IEEE/IFIP Network Operations and Management Symposium NOMS 2006.

[17]  Jignesh M. Patel,et al.  Energy management for MapReduce clusters , 2010, Proc. VLDB Endow..

[18]  Yanpei Chen,et al.  Towards Energy Efficient MapReduce , 2009 .

[19]  Aameek Singh,et al.  Shares and utilities based power consolidation in virtualized server environments , 2009, 2009 IFIP/IEEE International Symposium on Integrated Network Management.

[20]  Manish Marwah,et al.  Delivering Energy Proportionality with Non Energy-Proportional Systems - Optimizing the Ensemble , 2008, HotPower.

[21]  Arun Venkataramani,et al.  Black-box and Gray-box Strategies for Virtual Machine Migration , 2007, NSDI.

[22]  Abhishek Chandra,et al.  TR 10-008 Exploiting Spatio-Temporal Tradeoffs for Energy Efficient MapReduce in the Cloud , 2010 .

[23]  E. N. Elnozahy,et al.  Energy-Efficient Server Clusters , 2002, PACS.

[24]  Andrew V. Goldberg,et al.  Quincy: fair scheduling for distributed computing clusters , 2009, SOSP '09.

[25]  Anand Sivasubramaniam,et al.  Managing server energy and operational costs in hosting centers , 2005, SIGMETRICS '05.

[26]  Karsten Schwan,et al.  VirtualPower: coordinated power management in virtualized enterprise systems , 2007, SOSP.

[27]  Wolf-Dietrich Weber,et al.  Power provisioning for a warehouse-sized computer , 2007, ISCA '07.

[28]  Tarek F. Abdelzaher,et al.  Semantic-less coordination of power management and application performance , 2010, OPSR.

[29]  Michael Kistler,et al.  The case for power management in web servers , 2002 .

[30]  Akshat Verma,et al.  pMapper: Power and Migration Cost Aware Application Placement in Virtualized Systems , 2008, Middleware.

[31]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[32]  Malgorzata Steinder,et al.  Server virtualization in autonomic management of heterogeneous workloads , 2007, Integrated Network Management.

[33]  Christoforos E. Kozyrakis,et al.  On the energy (in)efficiency of Hadoop clusters , 2010, OPSR.

[34]  Shikharesh Majumdar,et al.  Scheduling in multiprogrammed parallel systems , 1988, SIGMETRICS 1988.