This Paper Is Included in the Proceedings of the 12th Usenix Symposium on Operating Systems Design and Implementation (osdi '16). Graphene: Packing and Dependency-aware Scheduling for Data-parallel Clusters G: Packing and Dependency-aware Scheduling for Data-parallel Clusters

We present a newcluster scheduler, GRAPHENE, aimed at jobs that have a complex dependency structure and heterogeneous resource demands. Relaxing either of these challenges, i.e., scheduling a DAG of homogeneous tasks or an independent set of heterogeneous tasks, leads to NP-hard problems. Reasonable heuristics exist for these simpler problems, but they perform poorly when scheduling heterogeneous DAGs. Our key insights are: (1) focus on the long-running tasks and those with tough-to-pack resource demands, (2) compute a DAG schedule, offline, by first scheduling such troublesome tasks and then scheduling the remaining tasks without violating dependencies. These offline schedules are distilled to a simple precedence order and are enforced by an online component that scales to many jobs. The online component also uses heuristics to compactly pack tasks and to trade-off fairness for faster job completion. Evaluation on a 200-server cluster and using traces of production DAGs at Microsoft, shows that GRAPHENE improves median job completion time by 25% and cluster throughput by 30%.

[1]  Ronald L. Graham,et al.  Bounds on Multiprocessing Timing Anomalies , 1969, SIAM Journal of Applied Mathematics.

[2]  Bruce M. Maggs,et al.  Universal packet routing algorithms , 1988, [Proceedings 1988] 29th Annual Symposium on Foundations of Computer Science.

[3]  Prithviraj Banerjee,et al.  An Approximate Algorithm for the Partitionable Independent Task Scheduling Problem , 1990, ICPP.

[4]  David B. Shmoys,et al.  Improved approximation algorithms for shop scheduling problems , 1991, SODA '91.

[5]  Ingo Schiermeyer,et al.  Reverse-Fit: A 2-Optimal Algorithm for Packing Rectangles , 1994, ESA.

[6]  George Varghese,et al.  Efficient fair queueing using deficit round robin , 1995, SIGCOMM '95.

[7]  Rajeev Motwani,et al.  Randomized Algorithms , 1995, SIGA.

[8]  Eecient Fair Queuing Using Deecit round Robin , 1995 .

[9]  Aravind Srinivasan,et al.  Better approximation guarantees for job-shop scheduling , 1997, SODA '97.

[10]  Gerhard J. Woeginger,et al.  There is no Asymptotic PTAS for Two-Dimensional Vector Packing , 1997, Inf. Process. Lett..

[11]  Frank Kelly,et al.  Rate control for communication networks: shadow prices, proportional fairness and stability , 1998, J. Oper. Res. Soc..

[12]  Raj Jain,et al.  A Quantitative Measure Of Fairness And Discrimination For Resource Allocation In Shared Computer Systems , 1998, ArXiv.

[13]  Y.-K. Kwok,et al.  Static scheduling algorithms for allocating directed task graphs to multiprocessors , 1999, CSUR.

[14]  Christian Scheideler,et al.  A new algorithm approach to the general Lovász local lemma with applications to scheduling and satisfiability problems (extended abstract) , 2000, STOC '00.

[15]  Russ Bubley,et al.  Randomized algorithms , 1995, CSUR.

[16]  RENAUD LEPÈRE,et al.  Approximation Algorithms for Scheduling Malleable Tasks Under Precedence Constraints , 2001, Int. J. Found. Comput. Sci..

[17]  Sanjeev Khanna,et al.  On Multidimensional Packing Problems , 2004, SIAM J. Comput..

[18]  Ronald L. Graham,et al.  Optimal scheduling for two-processor systems , 1972, Acta Informatica.

[19]  Dirk Beyer,et al.  Value-maximizing deadline scheduling and its application to animation rendering , 2005, SPAA '05.

[20]  John Augustine,et al.  Strip packing with precedence constraints and strip packing with release times , 2006, SPAA '06.

[21]  Aravind Srinivasan,et al.  Scheduling on Unrelated Machines under Tree-Like Precedence Constraints , 2005, Algorithmica.

[22]  Leah Epstein,et al.  Multidimensional Packing Problems , 2018, Handbook of Approximation Algorithms and Metaheuristics.

[23]  Michael Isard,et al.  Autopilot: automatic data center management , 2007, OPSR.

[24]  Yuan Yu,et al.  Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[25]  Ola Svensson,et al.  (Acyclic) Job Shops are Hard to Approximate , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[26]  Ravi Kumar,et al.  Pig latin: a not-so-foreign language for data processing , 2008, SIGMOD Conference.

[27]  Jingren Zhou,et al.  SCOPE: easy and efficient parallel processing of massive data sets , 2008, Proc. VLDB Endow..

[28]  Randy H. Katz,et al.  Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.

[29]  Albert G. Greenberg,et al.  VL2: a scalable and flexible data center network , 2009, SIGCOMM '09.

[30]  Pete Wyckoff,et al.  Hive - A Warehousing Solution Over a Map-Reduce Framework , 2009, Proc. VLDB Endow..

[31]  Raouf Boutaba,et al.  Virtual Network Embedding with Coordinated Node and Link Mapping , 2009, IEEE INFOCOM 2009.

[32]  Sumit Gulwani,et al.  SPEED: precise and efficient static estimation of program computational complexity , 2009, POPL '09.

[33]  John Augustine,et al.  Strip packing with precedence constraints and strip packing with release times , 2009, Theor. Comput. Sci..

[34]  Ola Svensson,et al.  Improved Bounds for Flow Shop Scheduling , 2009, ICALP.

[35]  Andrew V. Goldberg,et al.  Quincy: fair scheduling for distributed computing clusters , 2009, SOSP '09.

[36]  Craig Chambers,et al.  FlumeJava: easy, efficient data-parallel pipelines , 2010, PLDI '10.

[37]  Scott Shenker,et al.  Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling , 2010, EuroSys '10.

[38]  Albert G. Greenberg,et al.  Reining in the Outliers in Map-Reduce Clusters using Mantri , 2010, OSDI.

[39]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[40]  Frank Dabek,et al.  Large-scale Incremental Processing Using Distributed Transactions and Notifications , 2010, OSDI.

[41]  Benjamin Hindman,et al.  Dominant Resource Fairness: Fair Allocation of Multiple Resource Types , 2011, NSDI.

[42]  Randy H. Katz,et al.  Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center , 2011, NSDI.

[43]  Rina Panigrahy,et al.  Heuristics for Vector Bin Packing , 2011 .

[44]  Srikanth Kandula,et al.  Reoptimizing Data Parallel Computing , 2012, NSDI.

[45]  Michael Abd-El-Malek,et al.  Omega: flexible, scalable schedulers for large compute clusters , 2013, EuroSys '13.

[46]  Yossi Azar,et al.  The loss of serving in the dark , 2013, STOC '13.

[47]  Yossi Azar,et al.  Tight bounds for online vector bin packing , 2013, STOC '13.

[48]  Ishai Menache,et al.  Efficient Online Scheduling for Deadline-Sensitive Batch Computing , 2013 .

[49]  Srikanth Kandula,et al.  Speeding up distributed request-response workflows , 2013, SIGCOMM.

[50]  Joseph Naor,et al.  Efficient online scheduling for deadline-sensitive jobs: extended abstract , 2013, SPAA.

[51]  Srikanth Kandula,et al.  Multi-resource packing for cluster schedulers , 2015, SIGCOMM.

[52]  Srikanth Kandula,et al.  Multi-resource packing for cluster schedulers , 2014, SIGCOMM.

[53]  Debmalya Panigrahi,et al.  Precedence-Constrained Scheduling of Malleable Jobs with Preemption , 2014, ICALP.

[54]  Carlo Curino,et al.  Reservation-based Scheduling: If You're Late Don't Blame Us! , 2014, SoCC.

[55]  Benjamin Livshits,et al.  Automated migration of build scripts using dynamic analysis and search-based refactoring , 2014, OOPSLA.

[56]  Janardhan Kulkarni,et al.  Tight Bounds for Online Vector Scheduling , 2014, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[57]  Abhishek Verma,et al.  Large-scale cluster management at Google with Borg , 2015, EuroSys.

[58]  Anja Feldmann,et al.  C3: Cutting Tail Latency in Cloud Data Stores via Adaptive Replica Selection , 2015, NSDI.

[59]  Joseph K. Bradley,et al.  Spark SQL: Relational Data Processing in Spark , 2015, SIGMOD Conference.

[60]  Yossi Azar,et al.  Truthful Online Scheduling with Commitments , 2015, EC.

[61]  Srikanth Kandula,et al.  CloudBuild: Microsoft's Distributed and Caching Build Service , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C).

[62]  Amos Fiat,et al.  Packing Small Vectors , 2016, SODA.

[63]  Zhenhua Liu,et al.  HUG: Multi-Resource Fairness for Correlated and Elastic Demands , 2016, NSDI.