Scheduling tasks with precedence constraints on multiple servers

We consider the problem of scheduling jobs which are modeled by directed acyclic graphs (DAG). In such graphs, nodes represent tasks of a job and edges represent precedence constraints in processing these tasks. The DAG scheduling problem, also known as scheduling in fork-join processing networks, is motivated by examples such as job scheduling in data centers and cloud computing, patient flow scheduling in health systems and many other applications. We consider a flexible system, in which servers may process different, possibly overlapping, sets of task types. In this paper, we first discuss the difficulties in designing provably efficient policies for DAG scheduling, which arise due to interactions between the flexibility of the processing environment and the precedence constraints in the system. A major difficulty is the classical synchronization issue, which is further complicated in the presence of system flexibility. Then, we propose two queueing networks to model the scheduling problem that overcome this difficulty. These are virtual queues that enable us to design provably efficient scheduling policies. We show that the well-known Max-Weight policy for these queueing networks is throughput-optimal. Finally, to compare the delay performance of the two queueing networks, we consider a simplified model in which tasks and servers are identical. We characterize their delay performances under a simple first-come-first-serve policy, via a novel coupling argument.

[1]  Yuan Yu,et al.  Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[2]  Bajis M. Dodin,et al.  Bounding the Project Completion Time Distribution in PERT Networks , 1985, Oper. Res..

[3]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[4]  J. Harrison Brownian models of open processing networks: canonical representation of workload , 2000 .

[5]  F. Baccelli,et al.  The fork-join queue and related systems with synchronization constraints: stochastic ordering and computable bounds , 1989, Advances in Applied Probability.

[6]  Jean C. Walrand,et al.  Robust scheduling in a flexible fork-join network , 2014, 53rd IEEE Conference on Decision and Control.

[7]  John N. Tsitsiklis,et al.  The performance of a precedence-based queuing discipline , 1986, JACM.

[8]  R. Srikant,et al.  Asymptotically tight steady-state queue length bounds implied by drift conditions , 2011, Queueing Syst. Theory Appl..

[9]  J. Dai,et al.  Asymptotic optimality of maximum pressure policies in stochastic processing networks. , 2008, 0901.2451.

[10]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[11]  Avishai Mandelbaum,et al.  Control of Fork-Join Networks in heavy traffic , 2012, 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[12]  J. Dai On Positive Harris Recurrence of Multiclass Queueing Networks: A Unified Approach Via Fluid Limit Models , 1995 .

[13]  Donald F. Towsley,et al.  Acyclic fork-join queuing networks , 1989, JACM.

[14]  Viên Nguyen Processing Networks with Parallel and Sequential Tasks: Heavy Traffic Analysis and Brownian Limits , 1993 .

[15]  Kishor S. Trivedi,et al.  Performance and Reliability Analysis Using Directed Acyclic Graphs , 1987, IEEE Transactions on Software Engineering.

[16]  Avishai Mandelbaum,et al.  ON PATIENT FLOW IN HOSPITALS: A DATA-BASED QUEUEING-SCIENCE PERSPECTIVE , 2015 .

[17]  L. Flatto,et al.  Two parallel queues created by arrivals with two demands. II , 1984 .

[18]  François Baccelli,et al.  On the execution of parallel programs on multiprocessor systems—a queuing theory approach , 1990, JACM.

[19]  Y.-K. Kwok,et al.  Static scheduling algorithms for allocating directed task graphs to multiprocessors , 1999, CSUR.

[20]  Asser N. Tantawi,et al.  Approximate Analysis of Fork/Join Synchronization in Parallel Queues , 1988, IEEE Trans. Computers.

[21]  R. Hall Patient flow : reducing delay in healthcare delivery , 2006 .

[22]  L. Flatto,et al.  Erratum: Two Parallel Queues Created by Arrivals with Two Demands I , 1985 .

[23]  François Baccelli,et al.  On the stability condition of a precedence-based queueing discipline , 1989, Advances in Applied Probability.

[24]  Armand M. Makowski,et al.  Simple computable bounds for the fork-join queue , 1985 .

[25]  Jean C. Walrand,et al.  On stability and performance of parallel processing systems , 1991, JACM.

[26]  Armand M. Makowski,et al.  The fork-join queue and related systems with synchronization constraints: stochastic ordering,approximations and computable bounds , 1986 .

[27]  P. Konstantopoulos,et al.  Stationary and stability of fork-join networks , 1989, Journal of Applied Probability.

[28]  J. G. Dai,et al.  Maximum Pressure Policies in Stochastic Processing Networks , 2005, Oper. Res..

[29]  P. R. Kumar,et al.  Re-entrant lines , 1993, Queueing Syst. Theory Appl..

[30]  Subir Varma Heavy and Light Traffic Approximations for Queues with Synchronization Constraints , 1990 .

[31]  Vien Nguyen The Trouble With Diversity--fork-join Networks With Heterogeneous Customer Population , 2015 .