Automatic generation of nested, fork-join parallelism

This paper presents an efficient algorithm that automatically generates a parallel program from a dependence-based representation of a sequential program. The resulting parallel program consists of nested fork-join constructs, composed from the loops and statements of the sequential program. Data dependences are handled by two techniques. One technique implicitly satisfies them by sequencing, thereby reducing parallelism. Where increased parallelism results, the other technique eliminates them by privatization: the introduction of process-specific private instances of variables. Additionally, the algorithm determines when copying values of such instances in and out of nested parallel constructs results in greater parallelism. This is the first algorithm for automatically generating parallelism for such a general model. The algorithm generates as much parallelism as is possible in our model while minimizing privatization.

[1]  Matthew S. Hecht,et al.  Flow Analysis of Computer Programs , 1977 .

[2]  Ron Cytron,et al.  Doacross: Beyond Vectorization for Multiprocessors , 1986, ICPP.

[3]  Robert E. Tarjan Testing flow graph reducibility , 1973, STOC '73.

[4]  Ron Cytron,et al.  What's In a Name? -or- The Value of Renaming for Parallelism Detection and Storage Allocation , 1987, ICPP.

[5]  Robert E. Tarjan,et al.  A fast algorithm for finding dominators in a flowgraph , 1979, TOPL.

[6]  D. Harel A Linear Time Algorithm for the Lowest Common Ancestors Problem (Extended Abstract) , 1980, FOCS 1980.

[7]  Joe D. Warren,et al.  The program dependence graph and its use in optimization , 1984, TOPL.

[8]  Preston Briggs Automatic parallelization , 1996, SIGP.

[9]  Ken Kennedy,et al.  Automatic decomposition of scientific programs for parallel execution , 1987, POPL '87.

[10]  Alfred V. Aho,et al.  The Design and Analysis of Computer Algorithms , 1974 .

[11]  Alexander V. Veidenbaum Compiler optimizations and architecture design issues for multiprocessors (parallel) , 1985 .

[12]  William Baxter,et al.  The program dependence graph and vectorization , 1989, POPL '89.

[13]  David A. Padua,et al.  Advanced compiler optimizations for supercomputers , 1986, CACM.

[14]  Wilson C. Hsieh,et al.  Automatic generation of DAG parallelism , 1989, PLDI '89.

[15]  Kevin P. McAuliffe,et al.  Automatic Management of Programmable Caches , 1988, ICPP.

[16]  Mark N. Wegman,et al.  An efficient method of computing static single assignment form , 1989, POPL '89.

[17]  K CytronRon,et al.  Interprocedural dependence analysis and parallelization , 2004 .

[18]  Alexandru Nicolau,et al.  Parallel processing: a smart compiler and a dumb machine , 1984, SIGP.

[19]  Wilson C. Hsieh Extracting parallelism from sequential programs , 1988 .

[20]  Henry S. Warren,et al.  Static main storage packing problems , 1978, Acta Informatica.

[21]  Dov Harel,et al.  A linear time algorithm for the lowest common ancestors problem , 1980, 21st Annual Symposium on Foundations of Computer Science (sfcs 1980).

[22]  David L. Kuck,et al.  The Structure of Computers and Computations , 1978 .

[23]  Harlan D. Mills,et al.  Mathematical foundations of structured programming , 1972 .

[24]  Ron Cytron,et al.  Interprocedural dependence analysis and parallelization , 1986, SIGP.

[25]  Dick Pountain A tutorial introduction to Occam programming - including a specification section defining the extended Occam language , 1986 .