Free scheduling for statement instances of parameterized arbitrarily nested affine loops

An approach is presented permitting us to build free scheduling for statement instances of affine loops. Under the free schedule, loop statement instances are executed as soon as their operands are available. This allows us to extract maximal fine-grained loop parallelism and minimize the number of synchronization events. The approach is based on calculating the power k of a relation representing exactly all dependences in a loop. In general, such a relation is a union of simpler relations. When there are troubles with calculating free scheduling due to the large number of simpler dependence relations, another technique is discussed allowing for extracting free scheduling in an iteration subspace defined by indices of inner nests of this loop. We demonstrate that if we are able to calculate the power k of a dependence relation describing all dependences in the loop, then we are able also to produce free scheduling. Experimental results exposing the effectiveness, efficiency, and time complexity of the algorithms are outlined. Problems to be resolved in the future to utilize the entire power of the presented techniques are discussed.

[1]  Martin Griebl,et al.  Index Set Splitting , 2000, International Journal of Parallel Programming.

[2]  Albert Cohen,et al.  Polyhedral Code Generation in the Real World , 2006, CC.

[3]  Frédéric Vivien,et al.  Optimal Fine and Medium Grain Parallelism Detection in Polyhedral Reduced Dependence Graphs , 1996, Proceedings of the 1996 Conference on Parallel Architectures and Compilation Technique.

[4]  Paul Feautrier,et al.  Some efficient solutions to the affine scheduling problem. I. One-dimensional time , 1992, International Journal of Parallel Programming.

[5]  Frédéric Vivien On the Optimality of Feautrier's Scheduling Algorithm , 2002, Euro-Par.

[6]  Ding-Kai Chen,et al.  Compiler optimizations for parallel loops with fine-grained synchronization , 1994 .

[7]  Monica S. Lam,et al.  An affine partitioning algorithm to maximize parallelism and minimize communication , 1999, ICS '99.

[8]  Paul Feautrier,et al.  Some efficient solutions to the affine scheduling problem. Part II. Multidimensional time , 1992, International Journal of Parallel Programming.

[9]  Sanjay V. Rajopadhye,et al.  Generation of Efficient Nested Loops from Polyhedra , 2000, International Journal of Parallel Programming.

[10]  Albert Cohen,et al.  Computing the Transitive Closure of a Union of Affine Integer Tuple Relations , 2009, COCOA.

[11]  David A. Padua,et al.  A Comparison of Four Synchronization Optimization Techniques , 1991, ICPP.

[12]  P. Sadayappan,et al.  Removal of Redundant Dependences in DOACROSS Loops with Constant Dependences , 1991, IEEE Trans. Parallel Distributed Syst..

[13]  Steve Alten,et al.  Omega Project , 1978, Encyclopedia of Parallel Computing.

[14]  Anna Beletska,et al.  An Iterative Algorithm of Computing the Transitive Closure of a Union of Parametrized Affine Integer Tuple Relations , 2012, Discret. Math. Algorithms Appl..

[15]  Sumit Roy,et al.  Compile Time Partitioning of Nested Loop Iteration Spaces with Non-uniform Dependences* , 1997, Parallel Algorithms Appl..

[16]  David A. Padua,et al.  Compiler Algorithms for Synchronization , 1987, IEEE Transactions on Computers.

[17]  W. Pugh,et al.  A framework for unifying reordering transformations , 1993 .

[18]  Mabo Robert Ito,et al.  Parallel Region Execution of Loops with Irregular Dependencies , 1994, 1994 Internatonal Conference on Parallel Processing Vol. 2.

[19]  Patrick Le Gouëslier d'Argence,et al.  Affine Scheduling on Bounded Convex Polyhedric Domains is Asymptotically Optimal , 1998, Theor. Comput. Sci..

[20]  Volodymyr Beletskyy,et al.  Finding Free Schedules for Non-uniform Loops , 2003, Euro-Par.

[21]  Wlodzimierz Bielecki,et al.  Calculating Exact Transitive Closure for a Normalized Affine Integer Tuple Relation , 2009, Electron. Notes Discret. Math..

[22]  Lionel M. Ni,et al.  Dependence Uniformization: A Loop Parallelization Technique , 1993, IEEE Trans. Parallel Distributed Syst..

[23]  Yves Robert,et al.  Constructive Methods for Scheduling Uniform Loop Nests , 1994, IEEE Trans. Parallel Distributed Syst..

[24]  William Pugh,et al.  An Exact Method for Analysis of Value-based Array Data Dependences , 1993, LCPC.

[25]  William Pugh,et al.  The Omega Library interface guide , 1995 .

[26]  Albert Cohen,et al.  Coarse-Grained Loop Parallelization: Iteration Space Slicing vs Affine Transformations , 2009, 2009 Eighth International Symposium on Parallel and Distributed Computing.

[27]  Cédric Bastoul,et al.  Code generation in the polyhedral model is easier than you think , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[28]  Yves Robert,et al.  Linear Scheduling Is Nearly Optimal , 1991, Parallel Process. Lett..

[29]  Yves Robert,et al.  Scheduling and Automatic Parallelization , 2000, Birkhäuser Boston.

[30]  Uday Bondhugula,et al.  A practical automatic polyhedral parallelizer and locality optimizer , 2008, PLDI '08.

[31]  Anna Beletska,et al.  An Iterative Algorithm of Computing the Transitive Closure of a Union of Parameterized Affine Integer Tuple Relations , 2010, COCOA.

[32]  Ken Kennedy,et al.  Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .