Spark on entropy: A reliable & efficient scheduler for low-latency parallel jobs in heterogeneous cloud

In heterogeneous cloud, the provision of quality of service (QoS) guarantees for on-line parallel analysis jobs is much more challenging than off-line ones, mainly due to the many involved parameters, unstable resource performance, various job pattern and dynamic query workload. In this paper we propose an entropy-based scheduling strategy for running the on-line parallel analysis as a service more reliable and efficient, and implement the proposed idea in Spark. Entropy, as a measure of the degree of disorder in a system, is an indicator of a system's tendency to progress out of order and into a chaotic condition, and it can thus serve to measure a cloud resource's reliability for jobs scheduling. The key idea of our Entropy Scheduler is to construct the new resource entropy metric and schedule tasks according to the resources ranking with the help of the new metric so as to provide QoS guarantees for on-line Spark analysis jobs. Experiments demonstrate that our approach significantly reduces the average query response time by 15% - 20% and standard deviation by 30% - 45% compare with the native Fair Scheduler in Spark.

[1]  Leon O. Chua,et al.  Local Activity is the Origin of Complexity , 2005, Int. J. Bifurc. Chaos.

[2]  Ladislau Bölöni,et al.  A Comparison of Eleven Static Heuristics for Mapping a Class of Independent Tasks onto Heterogeneous Distributed Computing Systems , 2001, J. Parallel Distributed Comput..

[3]  Viktor Leis,et al.  Morsel-driven parallelism: a NUMA-aware query evaluation framework for the many-core age , 2014, SIGMOD Conference.

[4]  Dongyao Wu,et al.  Making real time data analytics available as a service , 2015, 2015 11th International ACM SIGSOFT Conference on Quality of Software Architectures (QoSA).

[5]  Jeffrey Heer,et al.  Perfopticon: Visual Query Analysis for Distributed Databases , 2015, Comput. Graph. Forum.

[6]  Kwang Mong Sim,et al.  A family of heuristics for agent-based elastic Cloud bag-of-tasks concurrent scheduling , 2013, Future Gener. Comput. Syst..

[7]  Michael Abd-El-Malek,et al.  Omega: flexible, scalable schedulers for large compute clusters , 2013, EuroSys '13.

[8]  Kirk P. Arnett,et al.  The size of the IT job market , 2008, CACM.

[9]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[10]  Randy H. Katz,et al.  Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center , 2011, NSDI.

[11]  Huankai Chen,et al.  User-priority guided Min-Min scheduling algorithm for load balancing in cloud computing , 2013, 2013 National Conference on Parallel Computing Technologies (PARCOMPTECH).

[12]  R. Kavaliunas,et al.  The Measurement of Grid QoS Parameters , 2007, 2007 29th International Conference on Information Technology Interfaces.

[13]  Stephen Wolfram,et al.  Universality and complexity in cellular automata , 1983 .

[14]  L. Boltzmann The Second Law of Thermodynamics , 1974 .

[15]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[16]  Frank L. Lambert The Second Law of Thermodynamics , 2018, A Concise Manual of Engineering Thermodynamics.

[17]  Sven Schade,et al.  Real-Time Anomaly Detection from Environmental Data Streams , 2015, AGILE Conf..

[18]  Randy H. Katz,et al.  Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.

[19]  Carlo Curino,et al.  Apache Hadoop YARN: yet another resource negotiator , 2013, SoCC.

[20]  H.-S. Gan,et al.  Comparing deterministic, robust and online scheduling using entropy , 2005 .

[21]  Andrey Gubarev,et al.  Dremel : Interactive Analysis of Web-Scale Datasets , 2011 .

[22]  Pete Wyckoff,et al.  Hive - A Warehousing Solution Over a Map-Reduce Framework , 2009, Proc. VLDB Endow..

[23]  Robert A. J. Matthews,et al.  The Science of Murphy's Law , 1997 .

[24]  Brandon Amos,et al.  Performance Study of Spindle, A Web Analytics Query Engine Implemented in Spark , 2014, 2014 IEEE 6th International Conference on Cloud Computing Technology and Science.

[25]  Huankai Chen,et al.  A Cost-Efficient and Reliable Resource Allocation Model Based on Cellular Automaton Entropy for Clou , 2013 .

[26]  Miguel A. Vega-Rodríguez,et al.  Nature-Inspired Algorithms Applied to an Efficient and Self-adaptive Resources Selection Model for Grid Applications , 2012, TPNC.

[27]  Xavier Lorca,et al.  Entropy: a consolidation manager for clusters , 2009, VEE '09.

[28]  Georgios Ellinas,et al.  Entropy-based scheduling of resource-constrained construction projects , 2009 .

[29]  Patrick Wendell,et al.  Sparrow: distributed, low latency scheduling , 2013, SOSP.