Coarse-Grained Loop Parallelization: Iteration Space Slicing vs Affine Transformations

Automatic coarse-grained parallelization of program loops is of great importance for multi-core computing systems. This paper presents a comparison of Iteration SpaceSlicing and Affine Transformation Framework algorithms aimed at extracting coarse-grained parallelism available in arbitrarily nested parameterized affine loops. We demonstrate that Iteration Space Slicing permits for extracting more coarse-grained parallelism in comparison to the Affine Transformation Framework. Experimental results show that by means of Iteration SpaceSlicing algorithms, we are able to extract coarse-grained parallelism for most loops of the NAS and UTDSP benchmarks, and that there is a strong need in devising advanced algorithms for calculating the exact transitive closure of dependence relations in order to increase the applicability of that framework.

[1]  Pierluigi San Pietro,et al.  Finding Synchronization-Free Slices of Operations in Arbitrarily Nested Loops , 2008, ICCSA.

[2]  Pierre Boulet,et al.  Loop Parallelization Algorithms: From Parallelism Extraction to Code Generation , 1998, Parallel Comput..

[3]  Albert Cohen,et al.  Putting Polyhedral Loop Transformations to Work , 2003, LCPC.

[4]  William Pugh,et al.  The Omega Library interface guide , 1995 .

[5]  Pierluigi San Pietro,et al.  Extracting Coarse-Grained Parallelism in Program Loops with the Slicing Framework , 2007, Sixth International Symposium on Parallel and Distributed Computing (ISPDC'07).

[6]  Pierluigi San Pietro,et al.  Finding Synchronization-Free Parallelism Represented with Trees of Dependent Operations , 2008, ICA3PP.

[7]  William Pugh,et al.  Iteration space slicing and its application to communication optimization , 1997, ICS '97.

[8]  Corinne Ancourt,et al.  Scanning polyhedra with DO loops , 1991, PPOPP '91.

[9]  Mark Weiser,et al.  Program Slicing , 1981, IEEE Transactions on Software Engineering.

[10]  Paul Feautrier Toward Automatic Distribution , 1994, Parallel Process. Lett..

[11]  Cédric Bastoul,et al.  Code generation in the polyhedral model is easier than you think , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[12]  Monica S. Lam,et al.  An affine partitioning algorithm to maximize parallelism and minimize communication , 1999, ICS '99.

[13]  Wlodzimierz Bielecki,et al.  Extracting Synchronization-free chains of dependent iterations in non-uniform loops , 2007 .

[14]  Paul Feautrier,et al.  Some efficient solutions to the affine scheduling problem. Part II. Multidimensional time , 1992, International Journal of Parallel Programming.

[15]  Yves Robert,et al.  Scheduling and Automatic Parallelization , 2000, Birkhäuser Boston.

[16]  FeautrierPaul Some efficient solutions to the affine scheduling problem , 1992 .

[17]  Monica S. Lam,et al.  Communication-Free Parallelization via Affine Transformations , 1994, LCPC.

[18]  Albert Cohen,et al.  Polyhedral Code Generation in the Real World , 2006, CC.

[19]  Paul Feautrier,et al.  Some efficient solutions to the affine scheduling problem. I. One-dimensional time , 1992, International Journal of Parallel Programming.

[20]  Sanjay V. Rajopadhye,et al.  Generation of Efficient Nested Loops from Polyhedra , 2000, International Journal of Parallel Programming.

[21]  Marek Palkowski,et al.  Using Message Passing for Developing Coarse-Grained Applications in OpenMP , 2008, ICSOFT.

[22]  William Pugh,et al.  An Exact Method for Analysis of Value-based Array Data Dependences , 1993, LCPC.