PYTHIA: Admission Control for Multi-Framework, Deadline- Driven, Big Data Workloads

Abstract-In this paper, we present PYTHIA, deadline-awareadmission control for systems that execute jobs from multiple bigdata (batch) frameworks using shared resources. PYTHIA addssupport for deadline-driven workloads in resource-constrainedcloud settings, for use by resource negotiators such as ApacheMesos or YARN. PYTHIA uses histories of job statistics toestimate the minimum number of CPUs to allocate to a job inorder for it to meet its deadline. PYTHIA admits jobs whenthese resources are available. Any job not admitted “fails fast”and wastes no resources. We implement a PYTHIA prototypeand empirically evaluate it using production YARN traces underdifferent resource constraints and deadline assignments. Ourresults show that PYTHIA is able to meet significantly moredeadlines than fair share approaches and wastes fewer cloudresources in resource-limited scenarios, for the workloads, clustersizes, and deadline assignments that we consider

[1]  Herodotos Herodotou,et al.  Profiling, what-if analysis, and cost-based optimization of MapReduce programs , 2011, Proc. VLDB Endow..

[2]  Aditya Akella,et al.  Altruistic Scheduling in Multi-Resource Clusters , 2016, OSDI.

[3]  Keke Chen,et al.  Towards Optimal Resource Provisioning for Running MapReduce Programs in Public Clouds , 2011, 2011 IEEE 4th International Conference on Cloud Computing.

[4]  Rajkumar Buyya,et al.  SLA-based admission control for a Software-as-a-Service provider in Cloud computing environments , 2012, J. Comput. Syst. Sci..

[5]  Zhen Huang,et al.  Deadline-Oriented Task Scheduling for MapReduce Environments , 2015, ICA3PP.

[6]  Malgorzata Steinder,et al.  Performance-driven task co-scheduling for MapReduce environments , 2010, 2010 IEEE Network Operations and Management Symposium - NOMS 2010.

[7]  Wei Wang,et al.  Multi-Resource Fair Allocation in Heterogeneous Cloud Computing Systems , 2015, IEEE Transactions on Parallel and Distributed Systems.

[8]  Boon Thau Loo,et al.  Automated profiling and resource management of pig programs for meeting service level objectives , 2012, ICAC '12.

[9]  Vasudeva Varma,et al.  Learning based opportunistic admission control algorithm for MapReduce as a service , 2010, ISEC.

[10]  Carlo Curino,et al.  Reservation-based Scheduling: If You're Late Don't Blame Us! , 2014, SoCC.

[11]  Dimitrios Tsoumakos,et al.  Mix ‘n’ match multi-engine analytics , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[12]  Kemafor Anyanwu,et al.  Scheduling Hadoop Jobs to Meet Deadlines , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[13]  Dhabaleswar K. Panda,et al.  Towards provision of quality of service guarantees in job scheduling , 2004, 2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935).

[14]  Yi Yao,et al.  Admission control in YARN clusters based on dynamic resource reservation , 2015, 2015 IFIP/IEEE International Symposium on Integrated Network Management (IM).

[15]  Subhajit Sidhanta,et al.  OptEx: A Deadline-Aware Cost Optimization Model for Spark , 2016, 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid).

[16]  Mor Harchol-Balter,et al.  TetriSched: global rescheduling with adaptive plan-ahead in dynamic heterogeneous clusters , 2016, EuroSys.

[17]  Kewen Wang,et al.  Performance Prediction for Apache Spark Platform , 2015, 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems.

[18]  Magdalena Balazinska,et al.  Abstract: Hadoop's Adolescence; A Comparative Workloads Analysis from Three Research Clusters , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.

[19]  David E. Culler,et al.  Hierarchical scheduling for diverse datacenter workloads , 2013, SoCC.

[20]  Li Zhang,et al.  Multi-resource Fair Sharing for Multiclass Workflows , 2015, PERV.

[21]  M. Balazinska,et al.  Hadoop ’ s Adolescence : A Comparative Workload Analysis from Three Research Clusters , 2012 .

[22]  Yanpei Chen,et al.  Interactive Analytical Processing in Big Data Systems: A Cross-Industry Study of MapReduce Workloads , 2012, Proc. VLDB Endow..

[23]  Steven Hand,et al.  Musketeer: all for one, one for all in data processing systems , 2015, EuroSys.

[24]  Kevin Wilkinson,et al.  Optimizing analytic data flows for multiple execution engines , 2012, SIGMOD Conference.

[25]  Christina Delimitrou,et al.  Quasar: resource-efficient and QoS-aware cluster management , 2014, ASPLOS.

[26]  Ion Stoica,et al.  Ernest: Efficient Performance Prediction for Large-Scale Advanced Analytics , 2016, NSDI.

[27]  Chandra Krintz,et al.  Big data framework interference in restricted private cloud settings , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[28]  Mung Chiang,et al.  RUSH: A RobUst ScHeduler to Manage Uncertain Completion-Times in Shared Clouds , 2016, 2016 IEEE 36th International Conference on Distributed Computing Systems (ICDCS).

[29]  Vana Kalogeraki,et al.  Real-Time Scheduling of Skewed MapReduce Jobs in Heterogeneous Environments , 2014, ICAC.

[30]  ChenZhijia,et al.  Self-Adaptive prediction of cloud resource demands using ensemble model and subtractive-fuzzy clustering based fuzzy neural network , 2015 .

[31]  Carlo Curino,et al.  Apache Hadoop YARN: yet another resource negotiator , 2013, SoCC.

[32]  Benjamin Hindman,et al.  Dominant Resource Fairness: Fair Allocation of Multiple Resource Types , 2011, NSDI.

[33]  Eric J. Friedman,et al.  Strategyproof allocation of discrete jobs on multiple machines , 2014, EC.

[34]  Mingfa Zhu,et al.  MIMP: Deadline and Interference Aware Scheduling of Hadoop Virtual Machines , 2014, 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[35]  Indranil Gupta,et al.  WOHA: Deadline-Aware Map-Reduce Workflow Scheduling Framework over Hadoop Clusters , 2014, 2014 IEEE 34th International Conference on Distributed Computing Systems.

[36]  Roy H. Campbell,et al.  Deadline-based workload management for MapReduce environments: Pieces of the performance puzzle , 2012, 2012 IEEE Network Operations and Management Symposium.

[37]  Roy H. Campbell,et al.  ARIA: automatic resource inference and allocation for mapreduce environments , 2011, ICAC '11.

[38]  Srikanth Kandula,et al.  Jockey: guaranteed job latency in data parallel clusters , 2012, EuroSys '12.

[39]  Michael Stonebraker,et al.  A comparison of approaches to large-scale data analysis , 2009, SIGMOD Conference.

[40]  Randy H. Katz,et al.  Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center , 2011, NSDI.

[41]  Palden Lama,et al.  AROMA: automated resource allocation and configuration of mapreduce environment in the cloud , 2012, ICAC '12.

[42]  Husnu S. Narman,et al.  CCRP: Customized cooperative resource provisioning for high resource utilization in clouds , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[43]  Antony I. T. Rowstron,et al.  Bridging the tenant-provider gap in cloud services , 2012, SoCC '12.

[44]  Sam Shah,et al.  The big data ecosystem at LinkedIn , 2013, SIGMOD '13.