Design and implementation of an analytical framework for interference aware job scheduling on Apache Spark platform

Apache Spark is one of the recently popularized open-source platforms that is increasingly being used for large-scale data analytic applications. However, while performance prediction in such systems is important for efficient job scheduling and optimizing resource allocation, interference among multiple Apache Spark jobs running concurrently in a virtualized environment makes it extremely difficult, which is addressed in this paper. Towards that, first, we develop data-driven analytical models to estimate the effect of interference among multiple Apache Spark jobs on job execution time in virtualized cloud environments. Next, we present the design of an interference aware job scheduling algorithm leveraging the developed analytical framework. We evaluated the accuracy of our models using four real-life applications (e.g., Page rank, K-means, Logistic regression, and Word count) on a 6 node cluster while running up to four jobs concurrently. Our experimental results show that the scheduling algorithm reduces the average execution time of individual jobs and the total execution time significantly, and ranges between 47 and 26% for individual jobs and 2–13% for total execution time respectively.

[1]  Michael Hunter,et al.  A Dynamic Data Driven Application System for Vehicle Tracking , 2014, ICCS.

[2]  Mingfa Zhu,et al.  MIMP: Deadline and Interference Aware Scheduling of Hadoop Virtual Machines , 2014, 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[3]  Carlos Brun,et al.  Coupling Wind Dynamics into a DDDAS Forest Fire Propagation Prediction System , 2012, ICCS.

[4]  Albert G. Greenberg,et al.  Ananta: cloud scale load balancing , 2013, SIGCOMM.

[5]  Swapna S. Gokhale,et al.  Modeling Interference for Apache Spark Jobs , 2016, 2016 IEEE 9th International Conference on Cloud Computing (CLOUD).

[6]  Calton Pu,et al.  IO Performance Interference among Consolidated n-Tier Applications: Sharing Is Better Than Isolation for Disks , 2014, 2014 IEEE 7th International Conference on Cloud Computing.

[7]  Scott Shenker,et al.  Making Sense of Performance in Data Analytics Frameworks , 2015, NSDI.

[8]  Carlo Curino,et al.  DBSeer: Resource and Performance Prediction for Building a Next Generation Database Cloud , 2013, CIDR.

[9]  Chita R. Das,et al.  HybridMR: A Hierarchical MapReduce Scheduler for Hybrid Data Centers , 2013, 2013 IEEE 33rd International Conference on Distributed Computing Systems.

[10]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[11]  Xi Chen,et al.  CloudScope: Diagnosing and Managing Performance Interference in Multi-tenant Clouds , 2015, 2015 IEEE 23rd International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems.

[12]  Puneet Singla,et al.  International Conference on Computational Science, ICCS 2012 , 2012, ICCS.

[13]  Yang Xiang,et al.  Hadoop Performance Modeling for Job Estimation and Resource Provisioning , 2016, IEEE Transactions on Parallel and Distributed Systems.

[14]  Kewen Wang,et al.  Performance Prediction for Apache Spark Platform , 2015, 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems.

[15]  Mauro Iacono,et al.  Performance evaluation of NoSQL big-data applications using multi-formalism models , 2014, Future Gener. Comput. Syst..

[16]  Liang Dong,et al.  Starfish: A Self-tuning System for Big Data Analytics , 2011, CIDR.

[17]  Samuel Kounev,et al.  Automated Modeling of I/O Performance and Interference Effects in Virtualized Storage Systems , 2014, 2014 IEEE 34th International Conference on Distributed Computing Systems Workshops (ICDCSW).

[18]  Qian Zhu,et al.  A Performance Interference Model for Managing Consolidated Workloads in QoS-Aware Clouds , 2012, 2012 IEEE Fifth International Conference on Cloud Computing.

[19]  H. Howie Huang,et al.  Matrix: Achieving Predictable Virtual Machine Performance in the Clouds , 2014, ICAC.

[20]  John P. Kerekes,et al.  Adaptive Optical Sensing in an Object Tracking DDDAS , 2012, ICCS.

[21]  Boon Thau Loo,et al.  Performance Modeling of MapReduce Jobs in Heterogeneous Cloud Environments , 2013, 2013 IEEE Sixth International Conference on Cloud Computing.

[22]  Anastasia Ailamaki,et al.  PREDIcT: Towards Predicting the Runtime of Large Scale Iterative Analytics , 2013, Proc. VLDB Endow..

[23]  Carlo Curino,et al.  Performance and resource modeling in highly-concurrent OLTP workloads , 2013, SIGMOD '13.

[24]  Bu-Sung Lee,et al.  Optimization of Resource Provisioning Cost in Cloud Computing , 2012, IEEE Transactions on Services Computing.

[25]  Cheng-Zhong Xu,et al.  Interference and locality-aware task scheduling for MapReduce applications in virtual clusters , 2013, HPDC.

[26]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[27]  Paolo Romano,et al.  Enhancing Performance Prediction Robustness by Combining Analytical Modeling and Machine Learning , 2015, ICPE.

[28]  Lida Xu,et al.  The internet of things: a survey , 2014, Information Systems Frontiers.

[29]  Christina Delimitrou,et al.  Quasar: resource-efficient and QoS-aware cluster management , 2014, ASPLOS.

[30]  Changjun Jiang,et al.  Resource and Deadline-Aware Job Scheduling in Dynamic Hadoop Clusters , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.

[31]  Xiaohui Gu,et al.  PREPARE: Predictive Performance Anomaly Prevention for Virtualized Cloud Systems , 2012, 2012 IEEE 32nd International Conference on Distributed Computing Systems.

[32]  J. Tinsley Oden,et al.  A Dynamic Data Driven Application System for Real-time Monitoring of Stochastic Damage , 2013, ICCS.