Using Free Scheduling for Programming Graphic Cards

An approach is presented permitting us to build free scheduling for statement instances of affine loops. Under the free schedule, loop statement instances are executed as soon as their operands are available. To describe and implement the approach, the dependence analysis by Pugh and Wonnacott was chosen where dependences are found in the form of tuple relations. The proposed algorithm has been implemented and verified by means of the Omega project software. Results of experiments with the NAS benchmark suite are discussed. Speed-up and efficiency of parallel code produced by means of the approach are studied. Problems to be resolved in order to enhance the power of the presented technique are outlined.

[1]  Volodymyr Beletskyy,et al.  Finding Free Schedules for Non-uniform Loops , 2003, Euro-Par.

[2]  Wlodzimierz Bielecki,et al.  Calculating Exact Transitive Closure for a Normalized Affine Integer Tuple Relation , 2009, Electron. Notes Discret. Math..

[3]  Frédéric Vivien,et al.  Optimal Fine and Medium Grain Parallelism Detection in Polyhedral Reduced Dependence Graphs , 1996, Proceedings of the 1996 Conference on Parallel Architectures and Compilation Technique.

[4]  Marek Palkowski,et al.  Extracting Both Affine and Non-linear Synchronization-Free Slices in Program Loops , 2009, PPAM.

[5]  P. Sadayappan,et al.  Removal of Redundant Dependences in DOACROSS Loops with Constant Dependences , 1991, IEEE Trans. Parallel Distributed Syst..

[6]  FeautrierPaul Some efficient solutions to the affine scheduling problem , 1992 .

[7]  Uday Bondhugula,et al.  A practical automatic polyhedral parallelizer and locality optimizer , 2008, PLDI '08.

[8]  Anna Beletska,et al.  An Iterative Algorithm of Computing the Transitive Closure of a Union of Parameterized Affine Integer Tuple Relations , 2010, COCOA.

[9]  Yves Robert,et al.  Constructive Methods for Scheduling Uniform Loop Nests , 1994, IEEE Trans. Parallel Distributed Syst..

[10]  William Pugh,et al.  An Exact Method for Analysis of Value-based Array Data Dependences , 1993, LCPC.

[11]  Ding-Kai Chen,et al.  Compiler optimizations for parallel loops with fine-grained synchronization , 1994 .

[12]  Albert Cohen,et al.  Coarse-Grained Loop Parallelization: Iteration Space Slicing vs Affine Transformations , 2009, 2009 Eighth International Symposium on Parallel and Distributed Computing.

[13]  Patrick Le Gouëslier d'Argence,et al.  Affine Scheduling on Bounded Convex Polyhedric Domains is Asymptotically Optimal , 1998, Theor. Comput. Sci..

[14]  Paul Feautrier,et al.  Some efficient solutions to the affine scheduling problem. I. One-dimensional time , 1992, International Journal of Parallel Programming.

[15]  Frédéric Vivien On the Optimality of Feautrier's Scheduling Algorithm , 2002, Euro-Par.

[16]  Yves Robert,et al.  Linear Scheduling Is Nearly Optimal , 1991, Parallel Process. Lett..

[17]  Cédric Bastoul,et al.  Code generation in the polyhedral model is easier than you think , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[18]  David A. Padua,et al.  A Comparison of Four Synchronization Optimization Techniques , 1991, ICPP.

[19]  David A. Padua,et al.  Compiler Algorithms for Synchronization , 1987, IEEE Transactions on Computers.

[20]  William Pugh,et al.  The Omega Library interface guide , 1995 .

[21]  Monica S. Lam,et al.  An affine partitioning algorithm to maximize parallelism and minimize communication , 1999, ICS '99.

[22]  Paul Feautrier,et al.  Some efficient solutions to the affine scheduling problem. Part II. Multidimensional time , 1992, International Journal of Parallel Programming.

[23]  W. Pugh,et al.  A framework for unifying reordering transformations , 1993 .