Obtaining synchronization-free code with maximum parallelism
暂无分享,去创建一个
This paper addresses the problem
of extracting the maximum synchronization-free parallelism
that may be present in loops.
In order to reduce communication and synchronization overheads, some
parallelizing compilers try to identify independent
computational partitions - if there are any -
of a sequential program. We focus on the case of loops with
constant dependence distance vectors. We consider a
statement instance as a basic unit that can be allocated to a
processor, in contrast other methods that use an iteration instance.
We show that a previously proposed family of scheduling heuristics
(Graph Traversal Scheduling) is optimal
in the sense that no more parallelism can be expressed
with synchronization-free code. Furthermore,
we give a quasi-linear time algorithm to find
such an optimal Graph Traversal Scheduling.
[1] Jordi Torres,et al. GTS: parallelization and vectorization of tight recurrences , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).