A Flexible Heuristic to Schedule Distributed Analytic Applications in Compute Clusters

This work addresses the problem of scheduling user-defined analytic applications, which we define as high-level compositions of frameworks, their components, and the logic necessary to carry out work. The key idea in our application definition, is to distinguish classes of components, including core and elastic types: the first being required for an application to make progress, the latter contributing to reduced execution times. We show that the problem of scheduling such applications poses new challenges, which existing approaches address inefficiently.

[1]  Ameet Talwalkar,et al.  MLlib: Machine Learning in Apache Spark , 2015, J. Mach. Learn. Res..

[2]  Daniele Venzano,et al.  Experimental Performance Evaluation of Cloud-Based Analytics-as-a-Service , 2016, 2016 IEEE 9th International Conference on Cloud Computing (CLOUD).

[3]  Kurt Cutajar Practical learning of deep gaussian processes via random Fourier features , 2016 .

[4]  Yuan Yu,et al.  Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[5]  M. Abadi,et al.  Naiad: a timely dataflow system , 2013, SOSP.

[6]  Michael Abd-El-Malek,et al.  Omega: flexible, scalable schedulers for large compute clusters , 2013, EuroSys '13.

[7]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[8]  Randy H. Katz,et al.  Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center , 2011, NSDI.

[9]  Carlo Curino,et al.  Apache Hadoop YARN: yet another resource negotiator , 2013, SoCC.

[10]  Remzi H. Arpaci-Dusseau Operating Systems: Three Easy Pieces , 2015, login Usenix Mag..

[11]  Dick H. J. Epema,et al.  Tyrex: Size-Based Resource Allocation in MapReduce Frameworks , 2016, CCGrid.

[12]  Andrew V. Goldberg,et al.  Quincy: fair scheduling for distributed computing clusters , 2009, SOSP '09.

[13]  M. Ragan-Kelley,et al.  The Jupyter/IPython architecture: a unified view of computational research, from interactive exploration to communication and publication. , 2014 .

[14]  Jirí Sgall Online Preemptive Scheduling on Parallel Machines , 2016, Encyclopedia of Algorithms.

[15]  Miron Livny,et al.  Condor: a distributed job scheduler , 2001 .

[16]  Kirk Pruhs,et al.  Online scheduling , 2003 .

[17]  Christina Delimitrou,et al.  Tarcil: reconciling scheduling speed and quality in large shared clusters , 2015, SoCC.

[18]  Christina Delimitrou,et al.  HCloud: Resource-Efficient Provisioning in Shared Cloud Systems , 2016, ASPLOS.

[19]  Pietro Michiardi,et al.  HFSP: Size-based scheduling for Hadoop , 2013, 2013 IEEE International Conference on Big Data.

[20]  Uwe Schwiegelshohn,et al.  Analysis of first-come-first-serve parallel job scheduling , 1998, SODA '98.

[21]  Pierre-François Dutot,et al.  Scheduling Parallel Tasks Approximation Algorithms , 2004, Handbook of Scheduling.

[22]  Alexandru Iosup,et al.  Balanced resource allocations across multiple dynamic MapReduce clusters , 2014, SIGMETRICS '14.

[23]  Benjamin Hindman,et al.  Dominant Resource Fairness: Fair Allocation of Multiple Resource Types , 2011, NSDI.

[24]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[25]  Pietro Michiardi,et al.  Revisiting Size-Based Scheduling with Estimated Job Sizes , 2014, 2014 IEEE 22nd International Symposium on Modelling, Analysis & Simulation of Computer and Telecommunication Systems.

[26]  Debasish Ghose,et al.  Scheduling Divisible Loads in Parallel and Distributed Systems , 1996 .

[27]  Dick H. J. Epema,et al.  KOALA-F: A Resource Manager for Scheduling Frameworks in Clusters , 2016, 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid).

[28]  Aditya Akella,et al.  Altruistic Scheduling in Multi-Resource Clusters , 2016, OSDI.

[29]  Adam Wierman,et al.  Nearly insensitive bounds on SMART scheduling , 2005, SIGMETRICS '05.

[30]  Srikanth Kandula,et al.  This Paper Is Included in the Proceedings of the 12th Usenix Symposium on Operating Systems Design and Implementation (osdi '16). Graphene: Packing and Dependency-aware Scheduling for Data-parallel Clusters G: Packing and Dependency-aware Scheduling for Data-parallel Clusters , 2022 .

[31]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[32]  Patrick Wendell,et al.  Sparrow: distributed, low latency scheduling , 2013, SOSP.

[33]  Christina Delimitrou,et al.  Paragon: QoS-aware scheduling for heterogeneous datacenters , 2013, ASPLOS '13.

[34]  Srikanth Kandula,et al.  Efficient queue management for cluster scheduling , 2016, EuroSys.

[35]  Daniele Venzano,et al.  Flexible Scheduling of Distributed Analytic Applications , 2016, 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID).

[36]  Christina Delimitrou,et al.  Quasar: resource-efficient and QoS-aware cluster management , 2014, ASPLOS.

[37]  Daniele Venzano,et al.  A Data-Driven Approach to Dynamically Adjust Resource Allocation for Compute Clusters , 2018, ArXiv.

[38]  Randy H. Katz,et al.  Heterogeneity and dynamicity of clouds at scale: Google trace analysis , 2012, SoCC '12.

[39]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[40]  Abhishek Verma,et al.  Large-scale cluster management at Google with Borg , 2015, EuroSys.

[41]  Yin Wang,et al.  Bistro: Scheduling Data-Parallel Jobs Against Live Production Systems , 2015, USENIX Annual Technical Conference.