Maximizing parallelism and minimizing synchronization with affine transforms

This paper presents the first algorithm to find the optimal affine transform that maximizes the degree of parallelism while minimizing the degree of synchronization in a program with arbitrary loop nestings and affine data accesses. The problem is formulated without the use of imprecise data dependence abstractions such as data dependence vectors. The algorithm presented subsumes previously proposed program transformation algorithms that are based on unimodular transformations, loop fusion, fission, scaling, reindexing and/or statement reordering.

[1]  FeautrierLaboratoire Masi Some Eecient Solutions to the Aane Scheduling Problem Part Ii Multidimensional Time , 1992 .

[2]  Jordi Torres,et al.  Partitioning the statement per iteration space using non-singular matrices , 1993, ICS '93.

[3]  Ken Kennedy,et al.  Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution , 1993, LCPC.

[4]  Vivek Sarkar,et al.  A general framework for iteration-reordering loop transformations , 1992, PLDI '92.

[5]  Ken Kennedy,et al.  Optimizing for parallelism and data locality , 1992, ICS '92.

[6]  TimePaul FeautrierLaboratoire Masi Some Eecient Solutions to the Aane Scheduling Problem Part I One-dimensional Time , 1993 .

[7]  David K. Smith Theory of Linear and Integer Programming , 1987 .

[8]  David F. Bacon,et al.  Compiler transformations for high-performance computing , 1994, CSUR.

[9]  Utpal Banerjee,et al.  Loop Transformations for Restructuring Compilers: The Foundations , 1993, Springer US.

[10]  Michael E. Wolf,et al.  Improving locality and parallelism in nested loops , 1992 .

[11]  Keshav Pingali,et al.  Solving Alignment Using Elementary Linear Algebra , 1994, LCPC.

[12]  P. Feautrier Some Eecient Solutions to the Aane Scheduling Problem Part I One-dimensional Time , 1996 .

[13]  Monica S. Lam,et al.  A Loop Transformation Theory and an Algorithm to Maximize Parallelism , 1991, IEEE Trans. Parallel Distributed Syst..

[14]  William Pugh,et al.  Minimizing communication while preserving parallelism , 1996, ICS '96.

[15]  Monica S. Lam,et al.  Global optimizations for parallelism and locality on scalable parallel machines , 1993, PLDI '93.

[16]  William Pugh,et al.  Determining schedules based on performance estimation , 1993 .

[17]  Corinne Ancourt,et al.  Scanning polyhedra with DO loops , 1991, PPOPP '91.

[18]  P. Sadayappan,et al.  Communication-Free Hyperplane Partitioning of Nested Loops , 1993, J. Parallel Distributed Comput..

[19]  W. Pugh,et al.  A framework for unifying reordering transformations , 1993 .

[20]  Monica S. Lam,et al.  Communication-Free Parallelization via Affine Transformations , 1994, LCPC.

[21]  Alexander Schrijver,et al.  Theory of linear and integer programming , 1986, Wiley-Interscience series in discrete mathematics and optimization.

[22]  Ken Kennedy,et al.  Automatic decomposition of scientific programs for parallel execution , 1987, POPL '87.