Electron: Towards Efficient Resource Management on Heterogeneous Clusters with Apache Mesos

As data centers continue to grow in scale, the resource management software needs to work closely with the hardware infrastructure to provide high utilization, performance, fault tolerance, and high availability. Apache Mesos has emerged as a leader in this space, providing an abstraction over the entire cluster, data center, or cloud to present a uniform view of all the resources. In addition, frameworks built on Mesos such as Apache Aurora, developed within Twitter and later contributed to the Apache Software Foundation, allow massive job submissions with heterogeneous resource requirements. The availability of such tools in the Open Source space, with proven record of large scale production use, make them suitable for research on how they can be adapted for use in campus-clusters and emerging cloud infrastructures for different workloads in both academia and industry. As data centers run these workloads and strive to maintain high utilization of their components, they suffer a significant cost in terms of energy and power consumption. To address this cost we have developed our own framework, Electron, for use with Mesos. Electron is designed to be configurable with heuristic-driven power capping policies along with different scheduling policies such as Bin Packing and First Fit. We characterize the performance of Electron, in comparison with the widely used Aurora framework. On average, our experiments show that Electron can reduce the 95th percentile of CPU and DRAM power usage by 27.89%, total energy consumption by 19.15%, average power consumption by 27.90%, and max peak power usage by 16.91%, while maintaining a similar makespan when compared to Aurora using the proper combination of power capping and scheduling policies.

[1]  Tapio Niemi,et al.  How much power does your server consume? Estimating wall socket power using RAPL measurements , 2016, Computer Science - Research and Development.

[2]  DiwanAmer,et al.  The DaCapo benchmarks , 2006 .

[3]  Christoforos E. Kozyrakis,et al.  On the energy (in)efficiency of Hadoop clusters , 2010, OPSR.

[4]  Madhusudhan Govindaraju,et al.  Exploring the Design Space for Optimizations with Apache Aurora and Mesos , 2016, 2016 IEEE 9th International Conference on Cloud Computing (CLOUD).

[5]  Dirk Merkel,et al.  Docker: lightweight Linux containers for consistent development and deployment , 2014 .

[6]  Shen Li,et al.  TAPA: Temperature aware power allocation in data center with Map-Reduce , 2011, 2011 International Green Computing Conference and Workshops.

[7]  Laxmikant V. Kalé,et al.  Maximizing Throughput of Overprovisioned HPC Data Centers Under a Strict Power Budget , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[8]  Randy H. Katz,et al.  Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center , 2011, NSDI.

[9]  Madhusudhan Govindaraju,et al.  MARLA: MapReduce for Heterogeneous Clusters , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[10]  Rini T. Kaushik,et al.  GreenHDFS: towards an energy-conserving, storage-efficient, hybrid Hadoop compute cluster , 2010 .

[11]  Deva Bodas,et al.  Simple Power-Aware Scheduler to Limit Power Consumption by HPC System within a Budget , 2014, 2014 Energy Efficient Supercomputing Workshop.

[12]  Jignesh M. Patel,et al.  Energy management for MapReduce clusters , 2010, Proc. VLDB Endow..

[13]  Daniele Vigo,et al.  The Three-Dimensional Bin Packing Problem , 2000, Oper. Res..

[14]  Xu Yang,et al.  Integrating dynamic pricing of electricity into energy aware scheduling for HPC systems , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[15]  Sriram Sankar,et al.  The need for speed and stability in data center power capping , 2012, 2012 International Green Computing Conference (IGCC).

[16]  Daniel C. Stanzione,et al.  Jetstream: performance, early experiences, and early results , 2016, XSEDE.

[17]  Madhusudhan Govindaraju,et al.  Configuring a MapReduce Framework for Dynamic and Efficient Energy Adaptation , 2012, 2012 IEEE Fifth International Conference on Cloud Computing.

[18]  Sandia Report,et al.  Improving Performance via Mini-applications , 2009 .

[19]  Amer Diwan,et al.  The DaCapo benchmarks: java benchmarking development and analysis , 2006, OOPSLA '06.

[20]  Dimitrios S. Nikolopoulos,et al.  Power Capping: What Works, What Does Not , 2015, 2015 IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS).

[21]  John A. Chandy,et al.  Exploiting user metadata for energy-aware node allocation in a cloud storage system , 2016, J. Comput. Syst. Sci..