Multi-dimensional incremental loop fusion for data locality

Affine loop transformations have often been used for program optimization. Usually their focus lies on single loop nests. A few recent approaches also handle global programs with multiple loop nests but they are not really scalable towards realistic applications with dozens of nests. To reduce complexity, we split affine transformations into a linear transformation step and a translation step. This translation step can be used to perform general multidimensional loop fusion. We show that loop fusion can be performed incrementally and provide a greedy algorithm, which we illustrate on a simple example. Finally, we present a heuristic for data locality and provide some experimental results.

[1]  Corinne Ancourt,et al.  Minimal Data Dependence Abstractions for Loop Transformations , 1994, LCPC.

[2]  Gerda Janssens,et al.  Feasibility of incremental translation , 2002 .

[3]  Mark Horowitz,et al.  Energy dissipation in general purpose microprocessors , 1996, IEEE J. Solid State Circuits.

[4]  Frédéric Vivien,et al.  Combining Retiming and Scheduling Techniques for Loop Parallelization and Loop Tiling , 1997, Parallel Process. Lett..

[5]  Frédéric Vivien,et al.  Optimal Fine and Medium Grain Parallelism Detection in Polyhedral Reduced Dependence Graphs , 2004, International Journal of Parallel Programming.

[6]  Alain Darte,et al.  New results on array contraction [memory optimization] , 2002, Proceedings IEEE International Conference on Application- Specific Systems, Architectures, and Processors.

[7]  Kristof Beyls,et al.  Reuse Distance as a Metric for Cache Behavior. , 2001 .

[8]  Guang R. Gao,et al.  Collective Analysis and Transformation of Loop Clusters , 1992 .

[9]  Cheng Wang,et al.  Data locality enhancement by memory reduction , 2001, ICS '01.

[10]  Paul Feautrier,et al.  Some efficient solutions to the affine scheduling problem. I. One-dimensional time , 1992, International Journal of Parallel Programming.

[11]  Sanjay V. Rajopadhye,et al.  Generation of Efficient Nested Loops from Polyhedra , 2000, International Journal of Parallel Programming.

[12]  Alain Darte On the Complexity of Loop Fusion , 2000, Parallel Comput..

[13]  H.J. De Man,et al.  Automating High Level Control F'low Transformations For Dsp Memory Management , 1992, Workshop on VLSI Signal Processing.

[14]  Tarek S. Abdelrahman,et al.  Fusion of Loops for Parallelism and Locality , 1997, IEEE Trans. Parallel Distributed Syst..

[15]  Anne Mignotte,et al.  Loop alignment for memory accesses optimization , 1999, Proceedings 12th International Symposium on System Synthesis.

[16]  Yves Robert,et al.  Affine-by-Statement Scheduling of Uniform and Affine Loop Nests over Parametric , 1995, J. Parallel Distributed Comput..

[17]  Alain Darte,et al.  Complexity of Multi-dimensional Loop Alignment , 2002, STACS.

[18]  Ken Kennedy,et al.  Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution , 1993, LCPC.

[19]  Monica S. Lam,et al.  Blocking and array contraction across arbitrarily nested loops using affine partitioning , 2001, PPoPP '01.

[20]  Mahmut T. Kandemir,et al.  A Layout-Conscious Iteration Space Transformation Technique , 2001, IEEE Trans. Computers.

[21]  Teresa H. Y. Meng,et al.  Design of a low power video decompression chip set for portable applications , 1996, J. VLSI Signal Process..

[22]  Hugo De Man,et al.  Memory Size Reduction Through Storage Order Optimization for Embedded Parallel Multimedia Applications , 1997, Parallel Comput..

[23]  Monica S. Lam,et al.  Maximizing parallelism and minimizing synchronization with affine transforms , 1997, POPL '97.

[24]  Paul Feautrier,et al.  Some efficient solutions to the affine scheduling problem. Part II. Multidimensional time , 1992, International Journal of Parallel Programming.

[25]  Monica S. Lam,et al.  A data locality optimizing algorithm , 1991, PLDI '91.

[26]  Pierre Boulet,et al.  Loop Parallelization Algorithms: From Parallelism Extraction to Code Generation , 1998, Parallel Comput..