Data-centric multi-level blocking

We present a simple and novel framework for generating blocked codes for high-performance machines with a memory hierarchy. Unlike traditional compiler techniques like tiling, which are based on reasoning about the control flow of programs, our techniques are based on reasoning directly about the flow of data through the memory hierarchy. Our data-centric transformations permit a more direct solution to the problem of enhancing data locality than current control-centric techniques do, and generalize easily to multiple levels of memory hierarchy. We buttress these claims with performance numbers for standard benchmarks from the problem domain of dense numerical linear algebra. The simplicity and intuitive appeal of our approach should make it attractive to compiler writers as well as to library writers.

[1]  Michael Wolfe,et al.  Iteration Space Tiling for Memory Hierarchies , 1987, PPSC.

[2]  Anne Rogers,et al.  Process decomposition through locality of reference , 1989, PLDI '89.

[3]  Wei Li,et al.  Unifying data and control transformations for distributed shared-memory machines , 1995, PLDI '95.

[4]  Ken Kennedy,et al.  Compiler blockability of numerical algorithms , 1992, Proceedings Supercomputing '92.

[5]  Vivek Sarkar,et al.  Automatic selection of high-order transformations in the IBM XL FORTRAN compilers , 1997, IBM J. Res. Dev..

[6]  Steve Carr,et al.  Compiler blockability of dense matrix factorizations , 1997, TOMS.

[7]  Monica S. Lam,et al.  Data and computation transformations for multiprocessors , 1995, PPOPP '95.

[8]  Jack Dongarra,et al.  Automatic Blocking of Nested Loops , 1990 .

[9]  Steve Carr,et al.  A Compiler-Blockable Algorithm for QR Decomposition , 1995, PPSC.

[10]  Larry Carter,et al.  Hierarchical tiling for improved superscalar performance , 1995, Proceedings of 9th International Parallel Processing Symposium.

[11]  Keshav Pingali,et al.  Access normalization: loop restructuring for NUMA compilers , 1992, ASPLOS V.

[12]  Chau-Wen Tseng,et al.  Compiler optimizations for improving data locality , 1994, ASPLOS VI.

[13]  Monica S. Lam,et al.  A data locality optimizing algorithm , 1991, PLDI '91.

[14]  William Pugh,et al.  A practical algorithm for exact array dependence analysis , 1992, CACM.

[15]  Yves Robert,et al.  (Pen)-ultimate tiling? , 1994, Integr..

[16]  Monica S. Lam,et al.  The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.

[17]  Chau-Wen Tseng,et al.  Improving data locality with loop transformations , 1996, TOPL.

[18]  Keshav Pingali,et al.  Access normalization: loop restructuring for NUMA computers , 1993, TOCS.

[19]  Michael Wolfe,et al.  High performance compilers for parallel computing , 1995 .

[20]  Keshav Pingali,et al.  Solving Alignment Using Elementary Linear Algebra , 1994, LCPC.

[21]  Kathryn S. McKinley,et al.  Tile size selection using cache organization and data layout , 1995, PLDI '95.