论文信息 - GTS: parallelization and vectorization of tight recurrences

GTS: parallelization and vectorization of tight recurrences

In this paper we present a new method for extracting the maximum parallelism or vector operations out of DO loops with tight recurrences using sequential programming languages. We have named the method Graph Traverse Scheduling (GTS). It is devised to produce code for shared memory multiprocessors or vector machines. When parallelizing, hardware support for fast synchronization is assumed. The method is presented for single nested loops including one or several recurrences and we show how parallel and vector code is generated. Based on the dependence graph of a loop, we first evaluate its parallelism and vector length of statements. Then we apply GTS to distribute loop iterations between tasks or to generate vector operations of a given length. When this method is applied for parallel code generation, dependencies not included in the sequential execution of each task must be explicitly synchronized. A method to minimize the number of explicit synchronizations is also presented. We also present how to compute the synchronization-free parallelism obtaining fully independent tasks. When GTS is applied for vector code generation, a sequential loop of vector operations is obtained.

Jordi Torres | Eduard Ayguadé | Jesús Labarta | Patricia Borensztejn

[1] Howard B. Coleman. The Vectorizing Compiler for the Unisys ISP , 1987, ICPP.

[2] David Alejandro Padua Haiek. Multiprocessors: discussion of some theoretical and practical problems , 1980 .

[3] Jih-Kwon Peir,et al. Minimum Distance: A Method for Partitioning Recurrences for Multiprocessors , 1989, IEEE Trans. Computers.

[4] Yvon Jégou,et al. Synchronizing processors through memory requests in a tightly coupled multiprocessor , 1988, ISCA '88.

[5] Eduard Ayguadé,et al. GTS: Extracting Full Parallelism Out of DO Loops , 1989, PARLE.

[6] Janusz S. Kowalik,et al. Parallel MIMD computation : the HEP supercomputer and its applications , 1985 .

[7] Utpal Banerjee,et al. Speedup of ordinary programs , 1979 .

[8] Utpal Banerjee,et al. Time and Parallel Processor Bounds for Fortran-Like Loops , 1979, IEEE Transactions on Computers.

[9] David A. Padua,et al. Dependence graphs and compiler optimizations , 1981, POPL '81.

[10] Ron Cytron,et al. Doacross: Beyond Vectorization for Multiprocessors , 1986, ICPP.

[11] Constantine D. Polychronopoulos,et al. Parallel programming and compilers , 1988 .

[12] A. Gottleib,et al. The nyu ultracomputer- designing a mimd shared memory parallel computer , 1983 .

[13] Ken Kennedy,et al. Automatic translation of FORTRAN programs to vector form , 1987, TOPL.

[14] Ralph Grishman,et al. The NYU Ultracomputer—Designing an MIMD Shared Memory Parallel Computer , 1983, IEEE Transactions on Computers.