Tree traversals with task-memory affinities

We study the complexity of traversing tree-shaped workflows whose tasks require large I/O files. We target a heterogeneous architecture with two resource types, each with a different memory, such as a multicore node equipped with a dedicated accelerator (FPGA or GPU). The tasks in the workflow are colored according to their type and can be processed if all their input and output files can be stored in the corresponding memory. The amount of used memory of each type at a given execution step strongly depends upon the ordering in which the tasks are executed, and upon when communications between both memories are scheduled. The objective is to determine an efficient traversal that minimizes the maximum amount of memory of each type needed to traverse the whole tree. In this paper, we establish the complexity of this two-memory scheduling problem, and provide inapproximability results. In addition, we design several heuristics, based on both post-order and general traversals, and we evaluate them on a comprehensive set of tree graphs, including random trees as well as assembly trees arising in the context of sparse matrix factorizations.

[1]  Thomas Rauber,et al.  Memory-optimal evaluation of expression trees involving large objects , 1999, Comput. Lang. Syst. Struct..

[2]  Cédric Augonnet,et al.  StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..

[3]  Yves Robert,et al.  On Optimal Tree Traversals for Sparse Matrix Factorization , 2011, IPDPS.

[4]  Rizos Sakellariou,et al.  Scheduling Data-IntensiveWorkflows onto Storage-Constrained Distributed Resources , 2007, Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07).

[5]  Joseph W. H. Liu,et al.  On the storage requirement in the out-of-core multifrontal method for sparse factorization , 1986, TOMS.

[6]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[7]  Jeffrey D. Ullman,et al.  The Generation of Optimal Code for Arithmetic Expressions , 1970, JACM.

[8]  Frédéric Vivien,et al.  Scheduling Tree-Shaped Task Graphs to Minimize Memory and Makespan , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[9]  Jack Dongarra,et al.  A Class of Hybrid LAPACK Algorithms for Multicore and GPU Architectures , 2011, 2011 Symposium on Application Accelerators in High-Performance Computing.

[10]  Jean-Yves L'Excellent,et al.  Memory-based scheduling for a parallel multifrontal solver , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[11]  Ravi Sethi,et al.  Complete register allocation problems , 1973, SIAM J. Comput..

[12]  Robert E. Tarjan,et al.  The pebbling problem is complete in polynomial space , 1979, SIAM J. Comput..

[13]  W. H. Liu,et al.  AN APPLICATION OF GENERALIZED TREE PEBBLING TO SPARSE MATRIX FACTORIZATION , 2022 .

[14]  Gary L. Miller,et al.  Geometric mesh partitioning: implementation and experiments , 1995, Proceedings of 9th International Parallel Processing Symposium.