An Integer Linear Programming Approach for Optirnizing Cache Locality

The actual performance of programs on modern processors that employ deep memory hierarchies is closely related to the performance of the memory subsystem. Compiler optimizations aimed at improving cache locality are critical in realizing the performance potential of powerful processors. For scientific applications, several loop transformations have been shown to be useful in improving both temporal and spatial locality. Recently, there has been some work in the area of data layout optimizations, i.e., changing the memory layouts of multi-dimensional arrays from the languagedefined default such as column-major storage in Fortran. These memory layout optimizations affect the spatial locality characteristics of loop nests. This paper presents a technique based on integer linear programming (ILP) that attempts to derive the best combination of loop and data layout transformations. Prior attempts to unify loop and data layout transformations for programs consisting of a sequence of loop nests have been based on heuristics not only for transformations for a single loop nest but also for the sequence in which loop nests will be considered. The ILP formulation presented here obviates the need for such heuristics. Experimental results on a MIPS RlOOOO based system demonstrate the benefits of this approach, and show that the use of the ILP formulation does not increase the compilation time significantly.

[1]  Mahmut T. Kandemir,et al.  A graph based framework to detect optimal memory layouts for improving data locality , 1999, Proceedings 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing. IPPS/SPDP 1999.

[2]  Wei Li,et al.  Unifying data and control transformations for distributed shared-memory machines , 1995, PLDI '95.

[3]  Ricardo Bianchini,et al.  Application Performance on the MIT Alewife Machine , 1996, Computer.

[4]  Dennis Gannon,et al.  Strategies for cache and local memory management by global program transformation , 1988, J. Parallel Distributed Comput..

[5]  Chau-Wen Tseng,et al.  Improving data locality with loop transformations , 1996, TOPL.

[6]  Monica S. Lam,et al.  A data locality optimizing algorithm , 1991, PLDI '91.

[7]  Michael F. P. O'Boyle,et al.  Non-singular data transformations: definition, validity and applications , 1997, ICS '97.

[8]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[9]  Monica S. Lam,et al.  The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.

[10]  Keshav Pingali,et al.  Data-centric multi-level blocking , 1997, PLDI '97.

[11]  Mahmut T. Kandemir,et al.  A matrix-based approach to the global locality optimization problem , 1998, Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192).

[12]  Michael F. P. O'Boyle,et al.  Integrating loop and data transformations for global optimisation , 1998, Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192).

[13]  Vivek Sarkar,et al.  Locality Analysis for Distributed Shared-Memory Multiprocessors , 1996, LCPC.

[14]  Mahmut T. Kandemir,et al.  Improving locality using loop and data transformations in an integrated framework , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[15]  Wei Li,et al.  Compiling for NUMA Parallel Machines , 1993 .

[16]  Michael Wolfe,et al.  High performance compilers for parallel computing , 1995 .

[17]  Steve Carr,et al.  Combining optimization for cache and instruction-level parallelism , 1996, Proceedings of the 1996 Conference on Parallel Architectures and Compilation Technique.

[18]  William Pugh,et al.  The Omega Library interface guide , 1995 .

[19]  Robert J. Harrison,et al.  High-Performance Computational Chemistry: Hartree-Fock Electronic Structure Calculations on Massively Parallel Processors , 1999, Int. J. High Perform. Comput. Appl..

[20]  Dennis Gannon,et al.  Strategies for cache and local memory management by global program transformation , 1988, J. Parallel Distributed Comput..

[21]  Mahmut T. Kandemir,et al.  A hyperplane based approach for optimizing spatial locality in loop nests , 1998, ICS '98.

[22]  Jacqueline Chame,et al.  The combined effectiveness of unimodular transformations, tiling, and software prefetching , 1996, Proceedings of International Conference on Parallel Processing.

[23]  Ken Kennedy,et al.  Automatic Data Layout for High Performance Fortran , 1995, SC.

[24]  GannonDennis,et al.  Strategies for cache and local memory management by global program transformation , 1988 .

[25]  Vivek Sarkar,et al.  On Estimating and Enhancing Cache Effectiveness , 1991, LCPC.

[26]  E. Ayguade,et al.  A Novel Approach Towards Automatic Data Distribution , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[27]  John Zahorjan,et al.  Optimizing Data Locality by Array Restructuring , 1995 .