Near-optimal loop tiling by means of cache miss equations and genetic algorithms

The effectiveness of the memory hierarchy is critical for the performance of current processors. The performance of the memory hierarchy can be improved by means of program transformations such as loop tiling, which is a code transformation targeted to reduce capacity misses. This paper presents a novel systematic approach to perform near-optimal loop tiling based on an accurate data locality analysis (cache miss equations) and a powerful technique to search the solution space that is based on a genetic algorithm. The results show that this approach can remove practically all capacity misses for all considered benchmarks. The reduction of replacement misses results in a decrease of the miss ratio that can be as significant as a factor of 7 for the matrix multiply kernel.

[1]  Walter L. Smith Probability and Statistics , 1959, Nature.

[2]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[3]  Philip E. Gill,et al.  Practical optimization , 1981 .

[4]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[5]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[6]  Yuri Ermoliev,et al.  Numerical techniques for stochastic optimization , 1988 .

[7]  Pierre Hansen,et al.  Constrained Nonlinear 0-1 Programming , 1989 .

[8]  D. E. Goldberg,et al.  Genetic Algorithms in Search , 1989 .

[9]  Jack Dongarra,et al.  Automatic Blocking of Nested Loops , 1990 .

[10]  H. J. Arnold Introduction to the Practice of Statistics , 1990 .

[11]  Monica S. Lam,et al.  A data locality optimizing algorithm , 1991, PLDI '91.

[12]  Mateo Valero,et al.  A Uniform Internal Representation for High-Level and Instruction-Level Transformations , 1994 .

[13]  David F. Bacon,et al.  Compiler transformations for high-performance computing , 1994, CSUR.

[14]  Kathryn S. McKinley,et al.  Tile size selection using cache organization and data layout , 1995, PLDI '95.

[15]  Chau-Wen Tseng,et al.  Improving data locality with loop transformations , 1996, TOPL.

[16]  Gerardus Sierksma,et al.  Linear and integer programming - theory and practice , 1999, Pure and applied mathematics.

[17]  P. Clauss Counting solutions to linear and nonlinear constraints through Ehrhart polynomials: applications to analyze and transform scientific programs , 1996, ICS '96.

[18]  Philippe Clauss,et al.  Counting solutions to linear and nonlinear constraints through Ehrhart polynomials: applications to analyze and transform scientific programs , 1996 .

[19]  Sharad Malik,et al.  Cache miss equations: an analytical representation of cache misses , 1997, ICS '97.

[20]  Sharad Malik,et al.  Precise miss analysis for program transformations with caches of arbitrary associativity , 1998, ASPLOS VIII.

[21]  Sharad Malik,et al.  Cache miss equations: a compiler framework for analyzing and tuning memory behavior , 1999, TOPL.

[22]  Josep Llosa,et al.  An efficient solver for Cache Miss Equations , 2000, 2000 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS (Cat. No.00EX422).

[23]  Panos M. Pardalos,et al.  Introduction to Global Optimization , 2000, Introduction to Global Optimization.

[24]  Josep Llosa,et al.  Optimizing cache miss equations polyhedra , 2000, CARN.

[25]  Chau-Wen Tseng,et al.  Tiling Optimizations for 3D Scientific Computations , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[26]  Vivek Sarkar,et al.  An analytical model for loop tiling and its solution , 2000, 2000 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS (Cat. No.00EX422).

[27]  Josep Llosa,et al.  A fast implementation of cache miss equations , 2000 .

[28]  Josep Llosa,et al.  Near-Optimal Padding for Removing Conflict Misses , 2002, LCPC.

[29]  A data locality optimizing algorithm , 2004, SIGP.