Enhancing Spatial Locality using Data Layout Optimizations

This paper aims to improve locality of references by suitably choosing array layouts. We use a new definition of spatial reuse vectors that takes into account memory layout of arrays. This capability creates two opportunities. First, it allows us to develop an array restructuring framework based on a combination of hyperplane theory and reuse vectors. Second, it allows us to observe the effect of different array layout optimizations on spatial reuse vectors. Since the iteration space based locality optimizations also change the spatial reuse vectors, our approach allows us to compare the iteration-space based and data-space based approaches in terms of their effects on spatial reuse vectors. We illustrate the effectiveness of our technique using two benchmark examples on two distributed shared-memory machines, the Convex Exemplar and the SGI Origin.

[1]  Mahmut T. Kandemir,et al.  Compiler algorithms for optimizing locality and parallelism on shared and distributed memory machines , 1997, Proceedings 1997 International Conference on Parallel Architectures and Compilation Techniques.

[2]  John Zahorjan,et al.  Optimizing Data Locality by Array Restructuring , 1995 .

[3]  Mahmut Kandemir,et al.  A Data Layout Optimization Technique Based on Hyperplanes , 1997 .

[4]  Wei Li,et al.  Compiler cache optimizations for banded matrix problems , 1995, ICS '95.

[5]  Anoop Gupta,et al.  The DASH prototype: implementation and performance , 1992, ISCA '92.

[6]  Chau-Wen Tseng,et al.  Compiler optimizations for improving data locality , 1994, ASPLOS VI.

[7]  Henry G. Dietz,et al.  Reduction of Cache Coherence Overhead by Compiler Data Layout and Loop Transformation , 1991, LCPC.

[8]  Mahmut T. Kandemir,et al.  A compiler algorithm for optimizing locality in loop nests , 1997, ICS '97.

[9]  Monica S. Lam,et al.  Data and computation transformations for multiprocessors , 1995, PPOPP '95.

[10]  LiWei,et al.  Unifying data and control transformations for distributed shared-memory machines , 1995 .

[11]  TsengChau-Wen,et al.  Compiler optimizations for improving data locality , 1994 .

[12]  Barbara M. Chapman,et al.  Supercompilers for parallel and vector computers , 1990, ACM Press frontier series.

[13]  J. Ramanujam,et al.  Compile-Time Techniques for Data Distribution in Distributed Memory Machines , 1991, IEEE Trans. Parallel Distributed Syst..

[14]  Susan J. Eggers,et al.  Eliminating False Sharing , 1991, ICPP.

[15]  Monica S. Lam,et al.  A data locality optimizing algorithm , 1991, PLDI '91.

[16]  Chau-Wen Tseng,et al.  Improving data locality with loop transformations , 1996, TOPL.

[17]  Wei Li,et al.  Compiling for NUMA Parallel Machines , 1993 .

[18]  Wei Li,et al.  Unifying data and control transformations for distributed shared-memory machines , 1995, PLDI '95.

[19]  Michael Wolfe,et al.  High performance compilers for parallel computing , 1995 .

[20]  Dennis Gannon,et al.  Strategies for cache and local memory management by global program transformation , 1988, J. Parallel Distributed Comput..

[21]  Susan J. Eggers,et al.  Reducing false sharing on shared memory multiprocessors through compile time data transformations , 1995, PPOPP '95.

[22]  Jack J. Dongarra,et al.  A set of level 3 basic linear algebra subprograms , 1990, TOMS.

[23]  Michael F. P. O'Boyle,et al.  Non-singular data transformations: definition, validity and applications , 1997, ICS '97.