Not All Joules are Equal: Towards Energy-Efficient and Green-Aware Data Processing Frameworks

Interests have been growing in integrating renewable energy into data centers, which attracts many research efforts in developing green-aware algorithms and systems. However, little attention was paid to the efficiency of each joule consumed by data center workloads. In fact, not all joules are equal in the sense that the amount of work that can be done by a joule can vary significantly in data centers. Ignoring this fact leads to significant energy waste (by 25% of the total energy consumption in Hadoop YARN on a Facebook production trace according to our study). In this paper, we investigate how to exploit such joule efficiency to maximize the benefits of renewable energy for MapReduce framework. We develop job/task scheduling algorithms with a particular focus on the factors on joule efficiency in the data center, including the energy efficiency of MapReduce workloads, renewable energy supply and the battery usage. We further develop a simple yet effective performance-energy consumption model to guide our scheduling decisions. We have implemented GreenMR, an energy-efficient and green-aware MapReduce framework, on top of Hadoop YARN. The experiments demonstrate the accuracy of our models, and the effectiveness of our energy-efficient and green-aware optimizations over Hadoop YARN and a state-ofthe-art green-aware Hadoop YARN implementation.

[1]  Rini T. Kaushik,et al.  GreenHDFS: towards an energy-conserving, storage-efficient, hybrid Hadoop compute cluster , 2010 .

[2]  A. Rowstron,et al.  Towards predictable datacenter networks , 2011, SIGCOMM.

[3]  Tajana Rosing,et al.  Utilizing green energy prediction to schedule mixed batch and service jobs in data centers , 2011, OPSR.

[4]  Ian Miguel,et al.  The Temporal Knapsack Problem and Its Solution , 2005, CPAIOR.

[5]  Albert G. Greenberg,et al.  Reining in the Outliers in Map-Reduce Clusters using Mantri , 2010, OSDI.

[6]  Jordi Torres,et al.  GreenHadoop: leveraging green energy in data-processing frameworks , 2012, EuroSys '12.

[7]  Antony I. T. Rowstron,et al.  Bridging the tenant-provider gap in cloud services , 2012, SoCC '12.

[8]  Anand Sivasubramaniam,et al.  Aggressive Datacenter Power Provisioning with Batteries , 2013, TOCS.

[9]  Jignesh M. Patel,et al.  Energy management for MapReduce clusters , 2010, Proc. VLDB Endow..

[10]  Bingsheng He,et al.  Green Databases Through Integration of Renewable Energy , 2013, CIDR.

[11]  Herodotos Herodotou,et al.  Profiling, what-if analysis, and cost-based optimization of MapReduce programs , 2011, Proc. VLDB Endow..

[12]  Hai Jin,et al.  SmartDPSS: Cost-Minimizing Multi-source Power Supply for Datacenters with Arbitrary Demand , 2013, 2013 IEEE 33rd International Conference on Distributed Computing Systems.

[13]  S. Iniyan,et al.  A review of energy models , 2006 .

[14]  Ion Stoica,et al.  True elasticity in multi-tenant data-intensive compute clusters , 2012, SoCC '12.

[15]  Ramesh K. Sitaraman,et al.  Using batteries to reduce the power costs of internet-scale distributed networks , 2012, SoCC '12.

[16]  Yanpei Chen,et al.  Energy efficiency for large-scale MapReduce workloads with significant interactive analysis , 2012, EuroSys '12.

[17]  Rajkumar Buyya,et al.  Scheduling scientific workflow applications with deadline and budget constraints using genetic algorithms , 2006, Sci. Program..

[18]  Jinoh Kim,et al.  Exploiting Replication for Energy-Aware Scheduling in Disk Storage Systems , 2015, IEEE Transactions on Parallel and Distributed Systems.

[19]  Bingsheng He,et al.  Green-aware workload scheduling in geographically distributed data centers , 2012, 4th IEEE International Conference on Cloud Computing Technology and Science Proceedings.

[20]  Bingsheng He,et al.  Gemini: An Adaptive Performance-Fairness Scheduler for Data-Intensive Cluster Computing , 2015, 2015 IEEE 7th International Conference on Cloud Computing Technology and Science (CloudCom).

[21]  Jinoh Kim,et al.  Energy proportionality for disk storage using replication , 2010, EDBT/ICDT '11.

[22]  Ricardo Bianchini,et al.  Leveraging renewable energy in data centers: present and future , 2012, HPDC '12.

[23]  Thu D. Nguyen,et al.  Parasol and GreenSwitch: managing datacenters powered by renewable energy , 2013, ASPLOS '13.

[24]  Thomas F. Wenisch,et al.  Power management of online data-intensive services , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[25]  D Schneider,et al.  Under the hood at Google and Facebook , 2011, IEEE Spectrum.

[26]  Archana Ganapathi,et al.  The Case for Evaluating MapReduce Performance Using Workload Suites , 2011, 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems.

[27]  Jordi Torres,et al.  GreenSlot: Scheduling energy consumption in green datacenters , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[28]  Randy H. Katz,et al.  Greening the Switch , 2008, HotPower.

[29]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[30]  Said Elnaffar,et al.  Towards workload-aware dbmss: identifying workload type and predicting its change , 2004 .

[31]  Anand Sivasubramaniam,et al.  Benefits and limitations of tapping into stored energy for datacenters , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[32]  Anand Sivasubramaniam,et al.  Energy storage in datacenters: what, where, and how much? , 2012, SIGMETRICS '12.