Obtaining synchronization-free code with maximum parallelism

This paper addresses the problem of extracting the maximum synchronization-free parallelism that may be present in loops. In order to reduce communication and synchronization overheads, some parallelizing compilers try to identify independent computational partitions - if there are any - of a sequential program. We focus on the case of loops with constant dependence distance vectors. We consider a statement instance as a basic unit that can be allocated to a processor, in contrast other methods that use an iteration instance. We show that a previously proposed family of scheduling heuristics (Graph Traversal Scheduling) is optimal in the sense that no more parallelism can be expressed with synchronization-free code. Furthermore, we give a quasi-linear time algorithm to find such an optimal Graph Traversal Scheduling.

[1]  Jordi Torres,et al.  GTS: parallelization and vectorization of tight recurrences , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).