Macaca: A Scalable and Energy-Efficient Platform for Coupling Cloud Computing with Distributed Embedded Computing

Microservers (embedded devices) with built-in sensors and network connectivity have become increasingly pervasive and their computational capabilities continue to be improved. Many works present that a heterogeneous cluster with the low-power microservers and high-performance nodes can provide competitive performance efficiency. However, they make simple modifications in existing distributed systems for adaptation, which have been proven not to fully exploit various the heterogeneous resources. In this paper, we argue that such heterogeneous cluster also calls for flexible and efficient computational resource scheduling. We then introduce Macaca, a platform for sharing and scheduling the distributed resources from embedded devices and Linux servers including computational resources, scale-out storages, and various data to accomplish collaborative processing tasks. In Macaca, we propose a two-layer scheduling mechanism to enhance utilization of these heterogeneous resources. Internally, the resource abstraction layer supports various sophisticated schedulers of existing distributed frameworks and decides how many resources to offer computing frameworks, while resource management layer is constructed for overall coordination of computational effectiveness and energy management for devices. Furthermore, Macaca adopts a novel strategy to support smart switch in three system models for energy-saving effectiveness. We evaluate Macaca by a variety of datasets and typical datacenter workloads, and the result shows that Macaca can achieve more efficient utilization of resources when sharing the heterogeneous cluster among diverse frameworks.

[1]  Michael Stonebraker,et al.  A comparison of approaches to large-scale data analysis , 2009, SIGMOD Conference.

[2]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[3]  Randy H. Katz,et al.  Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center , 2011, NSDI.

[4]  Toshimori Honjo,et al.  Hardware acceleration of Hadoop MapReduce , 2013, 2013 IEEE International Conference on Big Data.

[5]  Christoforos E. Kozyrakis,et al.  On the energy (in)efficiency of Hadoop clusters , 2010, OPSR.

[6]  Bruce M. Maggs,et al.  Cutting the electric bill for internet-scale systems , 2009, SIGCOMM '09.

[7]  Karsten Schwan,et al.  Scheduling Multi-tenant Cloud Workloads on Accelerator-Based Systems , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[8]  Michael J. Freedman,et al.  Aggregation and Degradation in JetStream: Streaming Analytics in the Wide Area , 2014, NSDI.

[9]  Avesta Sasan,et al.  Energy-efficient acceleration of big data analytics applications using FPGAs , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[10]  Carlos Guestrin,et al.  Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .

[11]  Chanwit Kaewkasi,et al.  A study of big data processing constraints on a low-power Hadoop cluster , 2014, 2014 International Computer Science and Engineering Conference (ICSEC).

[12]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[13]  Karsten Schwan,et al.  Multi-tenancy on GPGPU-based servers , 2013, VTDC '13.