Revisiting dynamic DAG scheduling under memory constraints for shared-memory platforms

This work focuses on dynamic DAG scheduling under memory constraints. We target a shared-memory platform equipped with p parallel processors. We aim at bounding the maximum amount of memory that may be needed by any schedule using p processors to execute the DAG. We refine the classical model that computes maximum cuts by introducing two types of memory edges in the DAG, black edges for regular precedence constraints and red edges for actual memory consumption during execution. A valid edge cut cannot include more than p red edges. This limitation had never been taken into account in previous works, and dramatically changes the complexity of the problem, which was polynomial and becomes NP-hard. We introduce an Integer Linear Program (ILP) to solve it, together with an efficient heuristic based on rounding the rational solution of the ILP. In addition, we propose an exact polynomial algorithm for series-parallel graphs. We provide an extensive set of experiments, both with randomly-generated graphs and with graphs arising form practical applications, which demonstrate the impact of resource constraints on peak memory usage.

[1]  Robert E. Tarjan,et al.  The pebbling problem is complete in polynomial space , 1979, SIAM J. Comput..

[2]  Cédric Augonnet,et al.  StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..

[3]  Frédéric Vivien,et al.  Limiting the memory footprint when dynamically scheduling DAGs on shared-memory platforms , 2019, J. Parallel Distributed Comput..

[4]  Jens Palsberg,et al.  Concurrent Collections , 2010, Sci. Program..

[5]  David Goudin,et al.  Controlling the Memory Subscription of Distributed Applications with a Task-Based Runtime System , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[6]  Timothy A. Davis,et al.  The university of Florida sparse matrix collection , 2011, TOMS.

[7]  Ann L. Chervenak,et al.  Scheduling data-intensive workflows on storage constrained resources , 2009, WORKS '09.

[8]  Benoît Meister,et al.  The Open Community Runtime: A runtime system for extreme scale computing , 2016, 2016 IEEE High Performance Extreme Computing Conference (HPEC).

[9]  Rizos Sakellariou,et al.  Scheduling Data-IntensiveWorkflows onto Storage-Constrained Distributed Resources , 2007, Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07).

[10]  Eduardo C. Xavier,et al.  A note on a Maximum k-Subset Intersection problem , 2012, Inf. Process. Lett..

[11]  Jean Roman,et al.  Sparse Matrix Ordering with SCOTCH , 1997, HPCN Europe.

[12]  Salim Hariri,et al.  Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing , 2002, IEEE Trans. Parallel Distributed Syst..

[13]  Sascha Hunold,et al.  One step toward bridging the gap between theory and practice in moldable task scheduling with precedence constraints , 2015, Concurr. Comput. Pract. Exp..

[14]  Frédéric Suter,et al.  A Bi-criteria Algorithm for Scheduling Parallel Task Graphs on Clusters , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[15]  Thomas Hérault,et al.  PaRSEC: Exploiting Heterogeneity to Enhance Scalability , 2013, Computing in Science & Engineering.

[16]  Eugene L. Lawler,et al.  The recognition of Series Parallel digraphs , 1979, SIAM J. Comput..

[17]  Ravi Sethi,et al.  Complete register allocation problems , 1973, SIAM J. Comput..

[18]  Arnold L. Rosenberg,et al.  On Scheduling Series-Parallel DAGs to Maximize Area , 2014, Int. J. Found. Comput. Sci..

[19]  Andrew S. Grimshaw,et al.  The Legion vision of a worldwide virtual computer , 1997, Commun. ACM.

[20]  Thomas Rauber,et al.  Memory-optimal evaluation of expression trees involving large objects , 1999, Comput. Lang. Syst. Struct..

[21]  Thierry Gautier,et al.  KAAPI: A thread scheduling runtime system for data flow computations on cluster of multi-processors , 2007, PASCO '07.

[22]  Patrick R. Amestoy,et al.  Robust Memory-Aware Mappings for Parallel Multifrontal Factorizations , 2012, SIAM J. Sci. Comput..

[23]  William Kuszmaul,et al.  Cilkmem: Algorithms for Analyzing the Memory High-Water Mark of Fork-Join Parallel Programs , 2019, APOCS.

[24]  Yves Robert,et al.  On Optimal Tree Traversals for Sparse Matrix Factorization , 2011, IPDPS.

[25]  Frédéric Vivien,et al.  Parallel Scheduling of DAGs under Memory Constraints , 2018, 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[26]  Ewa Deelman,et al.  Community Resources for Enabling Research in Distributed Scientific Workflows , 2014, 2014 IEEE 10th International Conference on e-Science.

[27]  Jeffrey D. Ullman,et al.  The Generation of Optimal Code for Arithmetic Expressions , 1970, JACM.

[28]  Timothy A. Davis,et al.  Algorithm 836: COLAMD, a column approximate minimum degree ordering algorithm , 2004, TOMS.

[29]  W. H. Liu,et al.  AN APPLICATION OF GENERALIZED TREE PEBBLING TO SPARSE MATRIX FACTORIZATION , 2022 .

[30]  Evripidis Bampis,et al.  Scheduling UET-UCT Series-Parallel Graphs on Two Processors , 1996, Theor. Comput. Sci..

[31]  Hans L. Bodlaender,et al.  Parallel Algorithms for Series Parallel Graphs and Graphs with Treewidth Two1 , 2001, Algorithmica.

[32]  Maciej Drozdowski,et al.  Scheduling multiprocessor tasks -- An overview , 1996 .

[33]  Vivek Sarkar,et al.  Bounded memory scheduling of dynamic task graphs , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).

[34]  Loris Marchal,et al.  Parallel Scheduling of Task Trees with Limited Memory , 2015, TOPC.

[35]  Emmanuel Agullo,et al.  Implementing Multifrontal Sparse Solvers for Multicore Architectures with Sequential Task Flow Runtime Systems , 2016, ACM Trans. Math. Softw..

[36]  Patrick Amestoy,et al.  A Fully Asynchronous Multifrontal Solver Using Distributed Dynamic Scheduling , 2001, SIAM J. Matrix Anal. Appl..