论文信息 - An experimental evaluation of tiling and shackling for memory hierarchy management

An experimental evaluation of tiling and shackling for memory hierarchy management

On modern computers the performance of programs is of ten limited by memory latency rather than by processor cycle time To reduce the impact of memory latency the restructuring compiler community has developed locality enhancing program transformations the most well known of which is loop tiling Tiling is restricted to perfectly nested loops but many imperfectly nested loops can be trans formed into perfectly nested loops that can then be tiled Recently we proposed an alternative approach to locality enhancement called data shackling Data shackling reasons about data traversals rather than iteration space traversals and can be applied directly to imperfectly nested loops We have implemented shackling in the SGI MIPSPro compiler which already has a sophisticated implementation of tiling Our experiments on the SGI Octane workstation with dense numerical linear algebra programs show that shackled code obtains double the performance of tiled code for most of these programs and obtains ve times the performance of tiled code for some versions of Cholesky factorization Data shackling has been integrated into the SGI MIPSPro com piler product line

[1] David W. Binkley,et al. Program slicing , 2008, 2008 Frontiers of Software Maintenance.

[2] Induprakas Kodukula. Data-centric compilation , 1998 .

[3] Michael Wolfe,et al. High performance compilers for parallel computing , 1995 .

[4] Keshav Pingali,et al. Access normalization: loop restructuring for NUMA compilers , 1992, ASPLOS V.

[5] Steve Carr,et al. Compiler blockability of dense matrix factorizations , 1997, TOMS.

[6] Michael Wolfe,et al. Iteration Space Tiling for Memory Hierarchies , 1987, PPSC.

[7] Keshav Pingali,et al. Data-centric multi-level blocking , 1997, PLDI '97.

[8] Jack Dongarra,et al. Automatic Blocking of Nested Loops , 1990 .

[9] Monica S. Lam,et al. A data locality optimizing algorithm , 1991, PLDI '91.

[10] Philippe Clauss,et al. Counting solutions to linear and nonlinear constraints through Ehrhart polynomials: applications to analyze and transform scientific programs , 1996 .

[11] Keshav Pingali,et al. Access normalization: loop restructuring for NUMA computers , 1993, TOCS.

[12] Ken Kennedy,et al. Compiler blockability of numerical algorithms , 1992, Proceedings Supercomputing '92.

[13] Kathryn S. McKinley,et al. Tile size selection using cache organization and data layout , 1995, PLDI '95.