ORiON: Online ResOurce Negotiator for Multiple Big Data Analytics Frameworks

In recent years we observe the rapid growth of large-scale analytics applications in a wide range of domains – from healthcare infrastructures to traffic management. The high volume of data that need to be processed has stimulated the development of special purpose frameworks which handle the data deluge by parallelizing data processing and concurrently using multiple computing nodes. These frameworks differentiate significantly in terms of the policies they follow to decompose their workloads into multiple tasks and also on the way they exploit the available computing resources. As a result, based on the framework that applications have been implemented in, we observe significant variations in their resource utilization and execution times. Therefore, determining the appropriate framework for executing a big data application is not trivial. In this work we propose Orion, a novel resource negotiator for cloud infrastructures that support multiple big data frameworks such as Apache Spark, Apache Flink and TensorFlow. More specifically, given an application, Orion determines the most appropriate framework to assign it to. Additionally, Orion reserves the required resources so that the application is able to meet its performance requirements. Our negotiator exploits state-of-the-art prediction techniques for estimating the application's execution time when it is assigned to a specific framework with varying configuration parameters and processing resources. Finally, our detailed experimental evaluation, using practical big data workloads on our local cluster, illustrates that our approach outperforms its competitors.

[1]  Tobias Achterberg,et al.  Constraint integer programming , 2007 .

[2]  Minlan Yu,et al.  CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics , 2017, NSDI.

[3]  Jignesh M. Patel,et al.  Twitter Heron: Stream Processing at Scale , 2015, SIGMOD Conference.

[4]  Yang Wang,et al.  Budget-Driven Scheduling Algorithms for Batches of MapReduce Jobs in Heterogeneous Clouds , 2014, IEEE Transactions on Cloud Computing.

[5]  Johannes Gehrke,et al.  Towards Expressive Publish/Subscribe Systems , 2006, EDBT.

[6]  Thierry Bertin-Mahieux,et al.  The Million Song Dataset , 2011, ISMIR.

[7]  Carlo Curino,et al.  Apache Hadoop YARN: yet another resource negotiator , 2013, SoCC.

[8]  Ioannis Konstantinou,et al.  Elastic management of cloud applications using adaptive reinforcement learning , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[9]  Liang Dong,et al.  Starfish: A Self-tuning System for Big Data Analytics , 2011, CIDR.

[10]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[11]  Nikos Parlavantzas,et al.  Resilin: Elastic MapReduce over Multiple Clouds , 2013, 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing.

[12]  Ion Stoica,et al.  Ernest: Efficient Performance Prediction for Large-Scale Advanced Analytics , 2016, NSDI.

[13]  Valentin Dalibard,et al.  BOAT: Building Auto-Tuners with Structured Bayesian Optimization , 2017, WWW.

[14]  Fan Yang,et al.  A comparison of general-purpose distributed systems for data processing , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[15]  Aram Karalic,et al.  Employing Linear Regression in Regression Tree Leaves , 1992, ECAI.

[16]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[17]  Tobias Achterberg,et al.  SCIP: solving constraint integer programs , 2009, Math. Program. Comput..

[18]  Seif Haridi,et al.  Apache Flink™: Stream and Batch Processing in a Single Engine , 2015, IEEE Data Eng. Bull..

[19]  Jure Leskovec,et al.  {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .

[20]  Roy H. Campbell,et al.  Orchestrating an Ensemble of MapReduce Jobs for Minimizing Their Makespan , 2013, IEEE Transactions on Dependable and Secure Computing.

[21]  Jordi Torres,et al.  Dynamic Configuration of Partitioning in Spark Applications , 2017, IEEE Transactions on Parallel and Distributed Systems.

[22]  A. Land,et al.  An Automatic Method for Solving Discrete Programming Problems , 1960, 50 Years of Integer Programming.

[23]  Vana Kalogeraki,et al.  ChEsS: Cost-Effective Scheduling Across Multiple Heterogeneous Mapreduce Clusters , 2016, 2016 IEEE International Conference on Autonomic Computing (ICAC).

[24]  Weikuan Yu,et al.  Preemptive ReduceTask Scheduling for Fair and Fast Job Completion , 2013, ICAC.

[25]  Vana Kalogeraki,et al.  Real-Time Scheduling of Skewed MapReduce Jobs in Heterogeneous Environments , 2014, ICAC.

[26]  Meikel Pöss,et al.  New TPC benchmarks for decision support and web commerce , 2000, SGMD.

[27]  Randy H. Katz,et al.  Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center , 2011, NSDI.

[28]  Dimitrios Gunopulos,et al.  Elastic complex event processing exploiting prediction , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[29]  Vana Kalogeraki,et al.  ExpREsS: EneRgy Efficient Scheduling of Mixed Stream and Batch Processing Workloads , 2017, 2017 IEEE International Conference on Autonomic Computing (ICAC).

[30]  Ioannis Konstantinou,et al.  Adaptive State Space Partitioning of Markov Decision Processes for Elastic Resource Management , 2017, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).