Synchronization-Free Automatic Parallelization for Arbitrarily Nested Affine Loops

This paper presents a new approach for extracting synchronization-free parallelism available in program loop nests. The approach allows for extracting parallelism for arbitrarily nested parametric loop nests, where the loop bounds and data accesses are affine functions of loop indices and symbolic parameters. Parallelization is realized using the transitive closure of a dependence graph. Speed-up of parallel code produced by means of the approach is studied using the NAS benchmark suite. Parallelism of loop nests is obtained by creating a kernel of computations represented in the OpenMP standard to be executed independently on multi-core computers. Results of an experimental study carried out by means of the many integrated core architecture Intel Xeon Phi is discussed.