Communication-Aware Scheduling of Precedence-Constrained Tasks on Related Machines

Scheduling precedence-constrained tasks is a classical problem that has been studied for more than fifty years. However, little progress has been made in the setting where there are communication delays between tasks. Results for the case of identical machines were derived nearly thirty years ago, and yet no results for related machines have followed. In this work, we propose a new scheduler, Generalized Earliest Time First (GETF), and provide the first provable, worst-case approximation guarantees for the goals of minimizing both the makespan and total weighted completion time of tasks with precedence constraints on related machines with machine-dependent communication times.

[1]  Scott Shenker,et al.  Usenix Association 10th Usenix Symposium on Networked Systems Design and Implementation (nsdi '13) 185 Effective Straggler Mitigation: Attack of the Clones , 2022 .

[2]  Ling Liu,et al.  Purlieus: Locality-aware resource allocation for MapReduce in a cloud , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[3]  Sangeetha Abdu Jyothi,et al.  TicTac: Accelerating Distributed Deep Learning with Communication Scheduling , 2018, MLSys.

[4]  Ola Svensson,et al.  Conditional hardness of precedence constrained scheduling on identical machines , 2010, STOC '10.

[5]  Edward G. Coffman,et al.  Computer and job-shop scheduling theory , 1976 .

[6]  Daniel Gajski,et al.  Hypertool: A Programming Aid for Message-Passing Systems , 1990, IEEE Trans. Parallel Distributed Syst..

[7]  Fabián A. Chudak,et al.  Approximation algorithms for precedence-constrained scheduling problems on parallel machines that run at different speeds , 1997, SODA '97.

[8]  Maciej Drozdowski,et al.  Scheduling for Parallel Processing , 2009, Computer Communications and Networks.

[9]  Ruben Mayer,et al.  The tensorflow partitioning and scheduling problem: it's the critical path! , 2017, ArXiv.

[10]  Albert G. Greenberg,et al.  Reining in the Outliers in Map-Reduce Clusters using Mantri , 2010, OSDI.

[11]  Maurice Queyranne,et al.  Approximation Bounds for a General Class of Precedence Constrained Parallel Machine Scheduling Problems , 1998, SIAM J. Comput..

[12]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[13]  Shi Li,et al.  Scheduling to Minimize Total Weighted Completion Time via Time-Indexed Linear Programming Relaxations , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[14]  Ronald L. Graham,et al.  Bounds on Multiprocessing Timing Anomalies , 1969, SIAM Journal of Applied Mathematics.

[15]  Y.-K. Kwok,et al.  Static scheduling algorithms for allocating directed task graphs to multiprocessors , 1999, CSUR.

[16]  David B. Shmoys,et al.  Scheduling to minimize average completion time: off-line and on-line algorithms , 1996, SODA '96.

[17]  Tao Yang,et al.  DSC: Scheduling Parallel Tasks on an Unbounded Number of Processors , 1994, IEEE Trans. Parallel Distributed Syst..

[18]  Jan Karel Lenstra,et al.  Complexity of Scheduling under Precedence Constraints , 1978, Oper. Res..

[19]  Carlo Curino,et al.  Apache Hadoop YARN: yet another resource negotiator , 2013, SoCC.

[20]  Adam Wierman,et al.  This Paper Is Included in the Proceedings of the 11th Usenix Symposium on Networked Systems Design and Implementation (nsdi '14). Grass: Trimming Stragglers in Approximation Analytics Grass: Trimming Stragglers in Approximation Analytics , 2022 .

[21]  Frank D. Anger,et al.  Scheduling Precedence Graphs in Systems with Interprocessor Communication Times , 1989, SIAM J. Comput..

[22]  Minghong Lin,et al.  Joint optimization of overlapping phases in MapReduce , 2013, PERV.

[23]  Salim Hariri,et al.  Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing , 2002, IEEE Trans. Parallel Distributed Syst..

[24]  Xiaoqiao Meng,et al.  Delay tails in MapReduce scheduling , 2012, SIGMETRICS '12.

[25]  Qingbo Wu,et al.  Workflow scheduling in cloud: a survey , 2015, The Journal of Supercomputing.

[26]  Adam Wierman,et al.  Hopper: Decentralized Speculation-aware Cluster Scheduling at Scale , 2015, SIGCOMM.

[27]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[28]  Randy H. Katz,et al.  Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.

[29]  Kenli Li,et al.  A Hybrid Chemical Reaction Optimization Scheme for Task Scheduling on Heterogeneous Computing Systems , 2015, IEEE Transactions on Parallel and Distributed Systems.

[30]  Subhash Khot,et al.  Optimal Long Code Test with One Free Bit , 2009, 2009 50th Annual IEEE Symposium on Foundations of Computer Science.

[31]  Roy H. Campbell,et al.  Two Sides of a Coin: Optimizing the Schedule of MapReduce Jobs to Minimize Their Makespan and Improve Cluster Performance , 2012, 2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[32]  Lei Ying,et al.  MapTask Scheduling in MapReduce With Data Locality: Throughput and Heavy-Traffic Optimality , 2013, IEEE/ACM Transactions on Networking.

[33]  Abbas Bazzi,et al.  Towards Tight Lower Bounds for Scheduling Problems , 2015, ESA.