JouleMR: Towards Cost-Effective and Green-Aware Data Processing Frameworks

Interests have been growing in energy management of the cluster effectively in order to reduce the energy consumption as well as the electricity cost. Renewable energy and dynamic pricing schemes in smart grids are two major emerging trends in energy markets. However, current data processing frameworks are not aware of the efficiency of each joule consumed by the data center workloads in the context of these two major trends. In fact, not all joules are equal in the sense that the amount of work that can be done by a joule can vary significantly in data centers. Ignoring this fact leads to significant energy waste (by 25 percent of the total energy consumption in Hadoop YARN on a Facebook production trace according to our study). In this paper, we propose JouleMR, a cost-effective and green-aware data processing framework. Specifically, we investigate how to exploit such joule efficiency to maximize the benefits of renewable energy as well as dynamic pricing schemes for MapReduce framework. We develop job/task scheduling algorithms with a particular focus on the factors on joule efficiency in the data center, including the energy efficiency of MapReduce workloads, renewable energy supply, dynamic pricing and the battery usage. We further develop a simple yet effective performance-energy consumption model to guide our scheduling decisions. We have implemented JouleMR on top of Hadoop YARN. The experiments demonstrate the accuracy of our models, and the effectiveness of our cost-effective and green-aware optimizations outperform the state-of-the-art implementations over Hadoop YARN.

[1]  Bingsheng He,et al.  A Study of Big Data Computing Platforms: Fairness and Energy Consumption , 2016, 2016 IEEE International Conference on Cloud Engineering Workshop (IC2EW).

[2]  Yanpei Chen,et al.  Integrating Renewable Energy Using Data Analytics Systems: Challenges and Opportunities , 2011, IEEE Data Eng. Bull..

[3]  Jinoh Kim,et al.  Exploiting Replication for Energy-Aware Scheduling in Disk Storage Systems , 2015, IEEE Transactions on Parallel and Distributed Systems.

[4]  Houman Homayoun,et al.  Managing distributed UPS energy for effective power capping in data centers , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[5]  Ion Stoica,et al.  True elasticity in multi-tenant data-intensive compute clusters , 2012, SoCC '12.

[6]  Archana Ganapathi,et al.  The Case for Evaluating MapReduce Performance Using Workload Suites , 2011, 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems.

[7]  Jordi Torres,et al.  GreenSlot: Scheduling energy consumption in green datacenters , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[8]  Thomas F. Wenisch,et al.  Power management of online data-intensive services , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[9]  Jinoh Kim,et al.  Energy proportionality for disk storage using replication , 2010, EDBT/ICDT '11.

[10]  Herodotos Herodotou,et al.  Profiling, what-if analysis, and cost-based optimization of MapReduce programs , 2011, Proc. VLDB Endow..

[11]  D Schneider,et al.  Under the hood at Google and Facebook , 2011, IEEE Spectrum.

[12]  Ankur Srivastava,et al.  Thermal and power-aware task scheduling for Hadoop based storage centric datacenters , 2010, International Conference on Green Computing.

[13]  Bingsheng He,et al.  Green Databases Through Integration of Renewable Energy , 2013, CIDR.

[14]  Rini T. Kaushik,et al.  GreenHDFS: towards an energy-conserving, storage-efficient, hybrid Hadoop compute cluster , 2010 .

[15]  Prashant J. Shenoy,et al.  Blink: managing server clusters on intermittent power , 2011, ASPLOS XVI.

[16]  Albert G. Greenberg,et al.  Reining in the Outliers in Map-Reduce Clusters using Mantri , 2010, OSDI.

[17]  Antony I. T. Rowstron,et al.  Bridging the tenant-provider gap in cloud services , 2012, SoCC '12.

[18]  Tajana Rosing,et al.  Utilizing green energy prediction to schedule mixed batch and service jobs in data centers , 2011, OPSR.

[19]  Jordi Torres,et al.  GreenHadoop: leveraging green energy in data-processing frameworks , 2012, EuroSys '12.

[20]  Hai Jin,et al.  SmartDPSS: Cost-Minimizing Multi-source Power Supply for Datacenters with Arbitrary Demand , 2013, 2013 IEEE 33rd International Conference on Distributed Computing Systems.

[21]  S. Iniyan,et al.  A review of energy models , 2006 .

[22]  Ling Liu,et al.  Cura: A Cost-Optimized Model for MapReduce in a Cloud , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[23]  Chao Li,et al.  Enabling distributed generation powered sustainable high-performance data center , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[24]  Bingsheng He,et al.  Gemini: An Adaptive Performance-Fairness Scheduler for Data-Intensive Cluster Computing , 2015, 2015 IEEE 7th International Conference on Cloud Computing Technology and Science (CloudCom).

[25]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[26]  Alexandros G. Dimakis,et al.  Efficient Algorithms for Renewable Energy Allocation to Delay Tolerant Consumers , 2010, 2010 First IEEE International Conference on Smart Grid Communications.

[27]  Ramesh K. Sitaraman,et al.  Using batteries to reduce the power costs of internet-scale distributed networks , 2012, SoCC '12.

[28]  Thu D. Nguyen,et al.  Providing green SLAs in High Performance Computing clouds , 2013, 2013 International Green Computing Conference Proceedings.

[29]  Thu D. Nguyen,et al.  Cost-and Energy-Aware Load Distribution Across Data Centers , 2009 .

[30]  Thu D. Nguyen,et al.  GreenPar: Scheduling Parallel High Performance Applications in Green Datacenters , 2015, ICS.

[31]  Thu D. Nguyen,et al.  Parasol and GreenSwitch: managing datacenters powered by renewable energy , 2013, ASPLOS '13.

[32]  Christopher Stewart,et al.  Some Joules Are More Precious Than Others: Managing Renewable Energy in the Datacenter∗ , 2009 .

[33]  Bingsheng He,et al.  Not All Joules are Equal: Towards Energy-Efficient and Green-Aware Data Processing Frameworks , 2016, 2016 IEEE International Conference on Cloud Engineering (IC2E).

[34]  Jignesh M. Patel,et al.  Energy management for MapReduce clusters , 2010, Proc. VLDB Endow..

[35]  Jignesh M. Patel,et al.  Wimpy node clusters: what about non-wimpy workloads? , 2010, DaMoN '10.

[36]  Bingsheng He,et al.  Green-aware workload scheduling in geographically distributed data centers , 2012, 4th IEEE International Conference on Cloud Computing Technology and Science Proceedings.

[37]  Chao Li,et al.  iSwitch: Coordinating and optimizing renewable energy powered server clusters , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[38]  Ian Miguel,et al.  The Temporal Knapsack Problem and Its Solution , 2005, CPAIOR.

[39]  Yanpei Chen,et al.  Energy efficiency for large-scale MapReduce workloads with significant interactive analysis , 2012, EuroSys '12.

[40]  Prashant J. Shenoy,et al.  Yank: Enabling Green Data Centers to Pull the Plug , 2013, NSDI.

[41]  Cristina L. Abad,et al.  Natjam: design and evaluation of eviction policies for supporting priorities and deadlines in mapreduce clusters , 2013, SoCC.

[42]  Christoforos E. Kozyrakis,et al.  On the energy (in)efficiency of Hadoop clusters , 2010, OPSR.

[43]  Bruce M. Maggs,et al.  Cutting the electric bill for internet-scale systems , 2009, SIGCOMM '09.

[44]  Chao Li,et al.  SolarCore: Solar energy driven multi-core architecture power management , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[45]  Said Elnaffar,et al.  Towards workload-aware dbmss: identifying workload type and predicting its change , 2004 .

[46]  Anand Sivasubramaniam,et al.  Benefits and limitations of tapping into stored energy for datacenters , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).