论文信息 - Multi-level tiling: M for the price of one

Multi-level tiling: M for the price of one

Tiling is a widely used loop transformation for exposing/exploiting parallelism and data locality. High-performance implementations use multiple levels of tiling to exploit the hierarchy of parallelism and cache/register locality. Efficient generation of multi-level tiled code is essential for effective use of multi-level tiling. Parameterized tiled code, where tile sizes are not fixed but left as symbolic parameters can enable several dynamic and run-time optimizations. Previous solutions to multi-level tiled loop generation are limited to the case where tile sizes are fixed at compile time. We present an algorithm that can generate multi-level parameterized tiled loops at the same cost as generating single-level tiled loops. The efficiency of our method is demonstrated on several benchmarks. We also present a method-useful in register tiling-for separating partial and full tiles at any arbitrary level of tiling. The code generator we have implemented is available as an open source tool.

[1] Monica S. Lam,et al. A data locality optimizing algorithm , 1991, PLDI '91.

[2] Michael F. P. O'Boyle,et al. Iterative Compilation , 2002, Embedded Processor Design Challenges.

[3] Marta Jiménez,et al. Register tiling in nonrectangular iteration spaces , 2002, TOPL.

[4] Jack J. Dongarra,et al. Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[5] Jingling Xue,et al. Loop Tiling for Parallelism , 2000, Kluwer International Series in Engineering and Computer Science.

[6] François Irigoin,et al. Supernode partitioning , 1988, POPL '88.

[7] Monica S. Lam,et al. A data locality optimizing algorithm (with retrospective) , 1991 .

[8] Nectarios Koziris,et al. An Efficient Code Generation Technique for Tiled Iteration Spaces , 2003, IEEE Trans. Parallel Distributed Syst..

[9] William J. Dally,et al. Sequoia: Programming the Memory Hierarchy , 2006, International Conference on Software Composition.

[10] Sanjay V. Rajopadhye,et al. Generation of Efficient Nested Loops from Polyhedra , 2000, International Journal of Parallel Programming.

[11] Jack Dongarra,et al. Automatic Blocking of Nested Loops , 1990 .

[12] James Demmel,et al. Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology , 1997, ICS '97.

[13] Corinne Ancourt,et al. Scanning polyhedra with DO loops , 1991, PPOPP '91.

[14] Steven W. K. Tjiang,et al. SUIF: an infrastructure for research on parallelizing and optimizing compilers , 1994, SIGP.

[15] David A. Padua,et al. Programming for parallelism and locality with hierarchically tiled arrays , 2006, PPoPP '06.

[16] William Pugh,et al. A practical algorithm for exact array dependence analysis , 1992, CACM.

[17] Michael F. P. O'Boyle,et al. Combined Selection of Tile Sizes and Unroll Factors Using Iterative Compilation , 2000, Proceedings 2000 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00622).

[18] Doran Wilde,et al. Loop nest synthesis using the polyhedral library , 1994 .

[19] Keshav Pingali,et al. Mobile MPI programs in computational grids , 2006, PPoPP '06.

[20] Michael F. P. O'Boyle,et al. Embedded Processor Design Challenges , 2002 .

[21] Marta Jiménez,et al. A Cost-Effective Implementation of Multilevel Tiling , 2003, IEEE Trans. Parallel Distributed Syst..

[22] Sanjay V. Rajopadhye,et al. Towards Optimal Multi-level Tiling for Stencil Computations , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[23] Franz Franchetti,et al. SPIRAL: Code Generation for DSP Transforms , 2005, Proceedings of the IEEE.

[24] Ed F. Deprettere,et al. Embedded Processor Design Challenges: Systems, Architectures, Modeling, and Simulation - SAMOS , 2002 .

[25] Dimitrios S. Nikolopoulos. Dynamic tiling for effective use of shared caches on multithreaded processors , 2004, Int. J. High Perform. Comput. Netw..

[26] Sanjay V. Rajopadhye,et al. Parameterized tiled loops for free , 2007, PLDI '07.

[27] Cédric Bastoul,et al. Code generation in the polyhedral model is easier than you think , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[28] David K. Lowenthal,et al. Accurately Selecting Block Size at Runtime in Pipelined Parallel Programs , 2000, International Journal of Parallel Programming.

[29] Chau-Wen Tseng,et al. Locality Optimizations for Multi-Level Caches , 1999, SC.

[30] Monica S. Lam,et al. Communication optimization and code generation for distributed memory machines , 1993, PLDI '93.

[31] Larry Carter,et al. Hierarchical tiling for improved superscalar performance , 1995, Proceedings of 9th International Parallel Processing Symposium.

[32] Armin Größlinger,et al. Introducing Non-linear Parameters to the Polyhedron Model , 2004 .

[33] Saman Amarasinghe,et al. Parallelizing Compiler Techniques Based on Linear Inequalities , 1997 .

[34] W. Kelly,et al. Code generation for multiple mappings , 1995, Proceedings Frontiers '95. The Fifth Symposium on the Frontiers of Massively Parallel Computation.