An Analysis of Loop Permutation on the HP PA-RISC

In modern computers, processor speed has become significantly faster than memory speed. Cache memories are designed to overcome this difference, but they are only effective when programs exhibit data locality. In this report, we present an experiment with compiler optimizations to improve data locality based on a simple cost model [14]. The model computes both temporal and spatial reuse of cache lines to find desirable loop permutations. The cost model drives the application of compound transformations consisting of loop permutation, loop distribution, and loop reversal.

[1]  Ken Kennedy,et al.  Scalar replacement in the presence of conditional control flow , 1994, Softw. Pract. Exp..

[2]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[3]  Chau-Wen Tseng,et al.  Compiler optimizations for improving data locality , 1994, ASPLOS VI.

[4]  Anoop Gupta,et al.  Design and evaluation of a compiler algorithm for prefetching , 1992, ASPLOS V.

[5]  Ken Kennedy,et al.  Estimating Interlock and Improving Balance for Pipelined Architectures , 1988, J. Parallel Distributed Comput..

[6]  Ken Kennedy,et al.  A Methodology for Procedure Cloning , 1993, Computer languages.

[7]  Monica S. Lam,et al.  A data locality optimizing algorithm , 1991, PLDI '91.

[8]  D LamMonica,et al.  The cache performance and optimizations of blocked algorithms , 1991 .

[9]  Ken Kennedy,et al.  Improving the ratio of memory operations to floating-point operations in loops , 1994, TOPL.

[10]  Ken Kennedy,et al.  Parascope:a Parallel Programming Environment , 1988 .

[11]  David L. Kuck,et al.  The Structure of Computers and Computations , 1978 .

[12]  David S. Wise Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation , 1991, PLDI 1991.

[13]  Ken Kennedy,et al.  Practical dependence testing , 1991, PLDI '91.

[14]  Ken Kennedy,et al.  Interprocedural transformations for parallel code generation , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[15]  Ken Kennedy,et al.  Improving register allocation for subscripted variables , 1990, PLDI '90.

[16]  Steven Mark Carr,et al.  Memory-hierarchy management , 1993 .

[17]  Monica S. Lam,et al.  The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.

[18]  Allan Porterfield,et al.  Data cache performance of supercomputer applications , 1990, Proceedings SUPERCOMPUTING '90.