Locality Enhancement by Array Contraction

In this paper, we study how array contraction can enhance locality and improve performance. In our previous work, we have developed a memory minimization scheme, SFC, which is a combination of loop shifting, loop fusion and array contraction. SFC focuses on reducing the memory requirement, and as a by-product, it may enhance cache locality. In this paper, we study how array contraction can contribute to cache locality and performance enhancement. We develop a memory cost model for SFC. We also present a fusion algorithm so that the predicted locality enhancement can be realized. Experimental results on both a real machine and a simulator demonstrate the effectiveness of array contraction on cache locality enhancement and performance improvement.

[1]  David F. Bacon,et al.  Compiler transformations for high-performance computing , 1994, CSUR.

[2]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[3]  Alain Darte,et al.  On the complexity of loop fusion , 1999, 1999 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00425).

[4]  Vivek Sarkar Optimized unrolling of nested loops , 2000, ICS '00.

[5]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[6]  V. Sarkar,et al.  Collective Loop Fusion for Array Contraction , 1992, LCPC.

[7]  Chau-Wen Tseng,et al.  Eliminating conflict misses for high performance architectures , 1998, ICS '98.

[8]  Kathryn S. McKinley,et al.  A Parametrized Loop Fusion Algorithm for Improving Parallelism and Cache Locality , 1997, Comput. J..

[9]  Cheng Wang,et al.  Data locality enhancement by memory reduction , 2001, ICS '01.

[10]  Monica S. Lam,et al.  A data locality optimizing algorithm , 1991, PLDI '91.

[11]  Tarek S. Abdelrahman,et al.  Fusion of Loops for Parallelism and Locality , 1997, IEEE Trans. Parallel Distributed Syst..

[12]  David A. Patterson,et al.  Computer architecture (2nd ed.): a quantitative approach , 1996 .

[13]  Ken Kennedy,et al.  Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution , 1993, LCPC.

[14]  Zhiyuan Li,et al.  Experience with efficient array data flow analysis for array privatization , 1997, PPOPP '97.