Shortest Path Evaluation for Hierarchial Grain Aggregation

This paper presents techniques for analyzing the data dependence structure of a program to determine an efficient task "grain" structure for parallel execution. A graph parsing technique is used to detect potential parallelism in a directed acyclic graph (DAG). The parse identifies graph components which are linear, independent, and primitive. Linear components must be executed serially, while independent components may be executed in parallel. Primitive components have no specific directives for pluralization of linearization. The primary contribution of this paper is a new technique for further decomposition of primitive structures. The new technique allows the parser to discover linear and independent components contained inside primitive components. The second contribution is an efficient scheduling algorithm which is systematic and flexible. The algorithm exploits the parallelism in a DAG based on its parse tree by modeling the parallelization/aggregation decisions with a multi-stage decision graph and then finding the shortest path of the decision graph. The enhanced technique is illustrated by its application to the DAG for the Cooley-Tukey Fast Fourier Transform.

[1]  Edward G. Coffman,et al.  Computer and job-shop scheduling theory , 1976 .

[2]  Carolyn McCreary,et al.  Automatic determination of grain size for efficient parallel processing , 1989, CSC '89.

[3]  C. L. McCreary An algorithm for parsing a graph grammar , 1987 .

[4]  Vivek Sarkar,et al.  Partitioning and Scheduling Parallel Programs for Multiprocessing , 1989 .

[5]  Ronald L. Graham,et al.  Bounds on Multiprocessing Timing Anomalies , 1969, SIAM Journal of Applied Mathematics.

[6]  Carolyn McCreary,et al.  A Comparison of Multiprocessor Scheduling Heuristics , 1994, 1994 Internatonal Conference on Parallel Processing Vol. 2.

[7]  Shirley Dex,et al.  JR 旅客販売総合システム(マルス)における運用及び管理について , 1991 .

[8]  Ellis Horowitz,et al.  Fundamentals of Computer Algorithms , 1978 .

[9]  Andrzej Ehrenfeucht,et al.  Theory of 2-Structures, Part I: Clans, Basic Subclasses, and Morphisms , 1990, Theor. Comput. Sci..

[10]  B. Kruatrachue,et al.  Grain determination for parallel processing systems , 1988, [1988] Proceedings of the Twenty-First Annual Hawaii International Conference on System Sciences. Volume II: Software track.

[11]  Tao Yang,et al.  A fast static scheduling algorithm for DAGs on an unbounded number of processors , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).