Minimizing Synchronization in Parallel Nested Loops

Although, computer system architecture and the throughput enhances continuously, the need for high computational speed and power in many scientific applications grows every day. As a result, implementation of parallel applications has gained more attention. Since nested loops are the most time-consuming parts of most programs, we propose a method for scheduling uniform nested loops to processors based on the equation of a straight line which includes the maximum possible number of dependence vectors. Experimental results show that the proposed method imposes a lower communication between processors compared with similar methods.

[1]  Monica S. Lam,et al.  An affine partitioning algorithm to maximize parallelism and minimize communication , 1999, ICS '99.

[2]  Jingling Xue,et al.  On Tiling as a Loop Transformation , 1997, Parallel Process. Lett..

[3]  Rudolf Eigenmann,et al.  Automatic program parallelization , 1993, Proc. IEEE.

[4]  Oliver Sinnen,et al.  Task Scheduling for Parallel Systems , 2007, Wiley series on parallel and distributed computing.

[5]  Uday Bondhugula Compiling affine loop nests for distributed-memory parallel architectures , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[6]  Yves Robert,et al.  (Pen)-ultimate tiling? , 1994, Integr..

[7]  Albert Cohen,et al.  Coarse-Grained Loop Parallelization: Iteration Space Slicing vs Affine Transformations , 2009, ISPDC.

[8]  David S. Johnson,et al.  The NP-Completeness Column: An Ongoing Guide , 1982, J. Algorithms.

[9]  Theodore Andronikos,et al.  Reducing the Communication Cost via Chain Pattern Scheduling , 2005, Fourth IEEE International Symposium on Network Computing and Applications.

[10]  Nectarios Koziris,et al.  Evaluation of loop grouping methods based on orthogonal projection spaces , 2000, Proceedings 2000 International Conference on Parallel Processing.

[11]  Yves Robert,et al.  Linear Scheduling Is Nearly Optimal , 1991, Parallel Process. Lett..

[12]  Yves Robert,et al.  On the Removal of Anti- and Output-Dependences , 2004, International Journal of Parallel Programming.