Implementing Flexible Computation Rules with Subexpression-level Loop Transformation

Computation Decomposition and Alignment (CDA) is a new loop transformation framework that extends the linear loop transformation framework and the more recently proposed Computation Alignment frameworks by linearly transforming computations at the granularity of subexpressions. It can be applied to achieve a number of optimization objectives, including the removal of data alignment constraints, the elimination of ownership tests, the reduction of cache conflicts, and improvements in data access locality.

[1]  Manish Gupta,et al.  Automatic Data Partitioning on Distributed Memory Multicomputers , 1992 .

[2]  Michael Stumm,et al.  CDA Loop Transformations , 1996 .

[3]  Anupam Basu,et al.  Deriving good transformations for mapping nested loops on hierarchical parallel machines in polynomial time , 1992, ICS '92.

[4]  Marina C. Chen,et al.  The Data Alignment Phase in Compiling Programs for Distrubuted-Memory Machines , 1991, J. Parallel Distributed Comput..

[5]  Ken Kennedy,et al.  Automatic decomposition of scientific programs for parallel execution , 1987, POPL '87.

[6]  Jordi Torres,et al.  Align and Distribute-based Linear Loop Transformations , 1993, LCPC.

[7]  William Pugh,et al.  Uniform techniques for loop optimization , 1991, ICS '91.

[8]  Keshav Pingali,et al.  A Singular Loop Transformation Framework Based on Non-Singular Matrices , 1992, LCPC.

[9]  Monica S. Lam,et al.  Global optimizations for parallelism and locality on scalable parallel machines , 1993, PLDI '93.

[10]  Larry Carter,et al.  Explicit data placement (XDP): a methodology for explicit compile-time representation and optimization of data movement , 1993, PPOPP '93.

[11]  W. Kelly,et al.  Code generation for multiple mappings , 1995, Proceedings Frontiers '95. The Fifth Symposium on the Frontiers of Massively Parallel Computation.

[12]  W. Pugh,et al.  A framework for unifying reordering transformations , 1993 .

[13]  David Alejandro Padua Haiek Multiprocessors: discussion of some theoretical and practical problems , 1980 .

[14]  Rice UniversityCORPORATE,et al.  High performance Fortran language specification , 1993 .

[15]  John R. Gilbert,et al.  Optimal evaluation of array expressions on massively parallel machines , 1995, TOPL.

[16]  Jordi Torres,et al.  Partitioning the statement per iteration space using non-singular matrices , 1993, ICS '93.

[17]  Monica S. Lam,et al.  Efficient and exact data dependence analysis , 1991, PLDI '91.