Compact DAG representation and its symbolic scheduling

Scheduling large task graphs is an important issue in parallel computing. In this paper we tackle the two following problems: (1) how to schedule a task graph, when it is too large to fit into memory? (2) How to build a generic program such that parameter values of a task graph can be given at run-time? Our answers feature the parameterized task graph (PTG), which is a symbolic representation of the task graph. We propose a dynamic scheduling algorithm which takes a PTG as an entry and allows us to generate a generic program. We present a theoretical study which shows that our algorithm finds good schedules for coarse-grain task graphs, has a low memory cost, and a low computational complexity. When the average number of operations of each task is large enough, we prove that the scheduling overhead is negligible with respect to the makespan. We also provide experimental results that demonstrate the feasibility of our approach using several compute-intensive kernels found in numerical scientific applications.

[1]  Hesham El-Rewini,et al.  Scheduling Parallel Program Tasks onto Arbitrary Target Machines , 1990, J. Parallel Distributed Comput..

[2]  Vivek Sarkar,et al.  Partitioning and Scheduling Parallel Programs for Multiprocessing , 1989 .

[3]  Alexander Schrijver,et al.  Theory of linear and integer programming , 1986, Wiley-Interscience series in discrete mathematics and optimization.

[4]  Z Liu,et al.  Scheduling Theory and its Applications , 1997 .

[5]  Tao Yang,et al.  DSC: Scheduling Parallel Tasks on an Unbounded Number of Processors , 1994, IEEE Trans. Parallel Distributed Syst..

[6]  Mary K. Vernon,et al.  Poems: end-to-end performance design of large parallel adaptive computational systems , 1998, WOSP '98.

[7]  Tao Yang,et al.  PYRROS: static task scheduling and code generation for message passing multiprocessors , 1992 .

[8]  Tao Yang,et al.  Sparse LU Factorization with Partial Pivoting on Distributed Memory Machines , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[9]  KwokYu-Kwong,et al.  Dynamic Critical-Path Scheduling , 1996 .

[10]  Mukesh Singhal,et al.  Load distributing for locally distributed systems , 1992, Computer.

[11]  Rice UniversityCORPORATE,et al.  High performance Fortran language specification , 1993 .

[12]  Tao Yang,et al.  List Scheduling With and Without Communication Delays , 1993, Parallel Comput..

[13]  Vincent Loechner,et al.  Deriving Formulae to Count Solutions to Parameterized Linear Systems using Ehrhart Polynomials: Appl , 1997 .

[14]  Richard J. Enbody,et al.  Performance Degradation in Large Wormhole-Routed Interprocessor Communication Networks , 1990, International Conference on Parallel Processing.

[15]  Mihalis Yannakakis,et al.  Towards an Architecture-Independent Analysis of Parallel Algorithms , 1990, SIAM J. Comput..

[16]  Jing-Chiou Liou,et al.  A new heuristic for scheduling parallel programs on multiprocessor , 1998, Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192).

[17]  William Pugh,et al.  A practical algorithm for exact array dependence analysis , 1992, CACM.

[18]  Emmanuel Jeannot,et al.  Symbolic partitioning and scheduling of parameterized task graphs , 1998, Proceedings 1998 International Conference on Parallel and Distributed Systems (Cat. No.98TB100250).

[19]  Tao Yang,et al.  Space and time efficient execution of parallel irregular computations , 1997, PPOPP '97.

[20]  Hesham H. Ali,et al.  Task scheduling in parallel and distributed systems , 1994, Prentice Hall series in innovative technology.

[21]  Christian Pérez Load balancing HPF programs by migrating virtual processors , 1997, Proceedings Second International Workshop on High-Level Parallel Programming Models and Supportive Environments.

[22]  Paul Feautrier Toward Automatic Distribution , 1994, Parallel Process. Lett..

[23]  Ishfaq Ahmad,et al.  Dynamic Critical-Path Scheduling: An Effective Technique for Allocating Task Graphs to Multiprocessors , 1996, IEEE Trans. Parallel Distributed Syst..

[24]  Bradley C. Kuszmaul,et al.  Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.

[25]  Vivek Sarkar,et al.  Partitioning and scheduling parallel programs for execution on multiprocessors , 1987 .

[26]  Paul Feautrier,et al.  Dataflow analysis of array and scalar references , 1991, International Journal of Parallel Programming.

[27]  Saman Amarasinghe,et al.  The suif compiler for scalable parallel machines , 1995 .

[28]  Yves Robert,et al.  Mapping affine loop nests: new results , 1995, HPCN Europe.

[29]  P. Feautrier,et al.  Distribution automatique des données et des calculs , 1996 .

[30]  Ishfaq Ahmad,et al.  Automatic parallelization and scheduling of programs on multiprocessors using CASCH , 1997, Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162).

[31]  Emmanuel Jeannot,et al.  Low memory cost dynamic scheduling of large coarse grain task graphs , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.

[32]  Apostolos Gerasoulis,et al.  Software support for parallel processing of irregular and dynamic computations , 1996 .

[33]  Monica S. Lam,et al.  Global optimizations for parallelism and locality on scalable parallel machines , 1993, PLDI '93.

[34]  Catherine Mongenet Affine Dependence Classification for Communications Minimization , 2004, International Journal of Parallel Programming.

[35]  Michel Cosnard,et al.  Automatic task graph generation techniques , 1995, Proceedings of the Twenty-Eighth Annual Hawaii International Conference on System Sciences.

[36]  Yves Robert,et al.  On the Alignment Problem , 1994, Parallel Process. Lett..

[37]  Y. Chen [The change of serum alpha 1-antitrypsin level in patients with spontaneous pneumothorax]. , 1995, Zhonghua jie he he hu xi za zhi = Zhonghua jiehe he huxi zazhi = Chinese journal of tuberculosis and respiratory diseases.

[38]  Joel H. Saltz,et al.  Multiprocessor Runtime Support for Fine-Grained, Irregular Dags , 1995, Parallel Process. Lett..

[39]  William Pugh,et al.  The Omega test: A fast and practical integer programming algorithm for dependence analysis , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[40]  Yong Chen,et al.  Runtime parallel incremental scheduling of DAGs , 2000, Proceedings 2000 International Conference on Parallel Processing.

[41]  Ellis Horowitz,et al.  Fundamentals of data structures in C , 1976 .

[42]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[43]  Guy E. Blelloch,et al.  Provably efficient scheduling for languages with fine-grained parallelism , 1995, SPAA '95.

[44]  Jing-Chiou Liou,et al.  Task Clustering and Scheduling for Distributed Memory Parallel Architectures , 1996, IEEE Trans. Parallel Distributed Syst..

[45]  Tao Yang,et al.  On the Granularity and Clustering of Directed Acyclic Task Graphs , 1993, IEEE Trans. Parallel Distributed Syst..

[46]  P. M. Pardalos,et al.  Nonlinear Assignment Problems: Algorithms and Applications (Combinatorial Optimization) , 2000 .

[47]  Tao Yang,et al.  Scheduling Of Structured and Unstructured computation , 1994, Interconnection Networks and Mapping and Scheduling Parallel Computations.

[48]  Emmanuel Jeannot,et al.  SLC: Symbolic scheduling for executing parameterized task graphs on multiprocessors , 1999, Proceedings of the 1999 International Conference on Parallel Processing.

[49]  Yong Luo,et al.  Poems: end-to-end performance design of large parallel adaptive computational systems , 1998, WOSP '98.

[50]  Emmanuel Jeannot,et al.  Compact DAG Representation and Its Dynamic Scheduling , 1999, J. Parallel Distributed Comput..

[51]  Y.-K. Kwok,et al.  Static scheduling algorithms for allocating directed task graphs to multiprocessors , 1999, CSUR.