A Study of Big Data Computing Platforms: Fairness and Energy Consumption

Improving the performance is the common sense on those large-scale data processing frameworks and fruitful studies are proposed in this direction. In contrast, the fairness and energy consumption of those frameworks need further exploration and how the performance, fairness and energy consumption interact each other on big data computing frameworks is not well addressed. In our research, we study the fairness and the energy consumption of those big data computing systems. We find that there are tradeoff between these factors. We conduct detailed studies on the factors which impact the tradeoff between different factors. Based on the observations in our study, we propose workload aware, energy-efficient and green-aware optimizations and implement them into Hadoop YARN. Particularly, in this thesis proposal, we propose to explore the following research problems. First, we explore the tradeoff between fairness and performance, and improve the performance of the state-of the-art approach by up to 225% [7]. Second, we consider the energy efficiency, renewable energy supply as well as battery usage and reduce the brown energy consumption of existing systems by more than 25% [8]. Third, we will explore the relationship between fairness and energy consumption, and eventually we will develop multi-objective optimizations for performance, fairness and energy consumption.

[1]  Bingsheng He,et al.  Not All Joules are Equal: Towards Energy-Efficient and Green-Aware Data Processing Frameworks , 2016, 2016 IEEE International Conference on Cloud Engineering (IC2E).

[2]  Jordi Torres,et al.  GreenHadoop: leveraging green energy in data-processing frameworks , 2012, EuroSys '12.

[3]  Andrew V. Goldberg,et al.  Quincy: fair scheduling for distributed computing clusters , 2009, SOSP '09.

[4]  Mung Chiang,et al.  Multiresource Allocation: Fairness–Efficiency Tradeoffs in a Unifying Framework , 2012, IEEE/ACM Transactions on Networking.

[5]  Baochun Li,et al.  Dominant resource fairness in cloud computing systems with heterogeneous servers , 2013, IEEE INFOCOM 2014 - IEEE Conference on Computer Communications.

[6]  Srikanth Kandula,et al.  Multi-resource packing for cluster schedulers , 2014, SIGCOMM.

[7]  Sonja Klingert,et al.  Renewable energy-aware data centre operations for smart cities the DC4Cities approach , 2015, 2015 International Conference on Smart Cities and Green ICT Systems (SMARTGREENS).

[8]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[9]  Randy H. Katz,et al.  Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center , 2011, NSDI.

[10]  Ricardo Bianchini,et al.  Leveraging renewable energy in data centers: present and future , 2012, HPDC '12.

[11]  Carlo Curino,et al.  Apache Hadoop YARN: yet another resource negotiator , 2013, SoCC.

[12]  Scott Shenker,et al.  Choosy: max-min fair sharing for datacenter jobs with constraints , 2013, EuroSys '13.

[13]  Jordi Torres,et al.  GreenSlot: Scheduling energy consumption in green datacenters , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[14]  Bingsheng He,et al.  Gemini: An Adaptive Performance-Fairness Scheduler for Data-Intensive Cluster Computing , 2015, 2015 IEEE 7th International Conference on Cloud Computing Technology and Science (CloudCom).

[15]  Thu D. Nguyen,et al.  Parasol and GreenSwitch: managing datacenters powered by renewable energy , 2013, ASPLOS '13.

[16]  Anand Sivasubramaniam,et al.  Benefits and limitations of tapping into stored energy for datacenters , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[17]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[18]  Bingsheng He,et al.  Green Databases Through Integration of Renewable Energy , 2013, CIDR.

[19]  Baochun Li,et al.  On the Fairness-Efficiency Tradeoff for Packet Processing with Multiple Resources , 2014, CoNEXT.