Exact analysis of the cache behavior of nested loops

We develop from first principles an exact model of the behavior of loop nests executing in a memory hicrarchy, by using a nontraditional classification of misses that has the key property of composability. We use Presburger formulas to express various kinds of misses as well as the state of the cache at the end of the loop nest. We use existing tools to simplify these formulas and to count cache misses. The model is powerful enough to handle imperfect loop nests and various flavors of non-linear array layouts based on bit interleaving of array indices. We also indicate how to handle modest levels of associativity, and how to perform limited symbolic analysis of cache behavior. The complexity of the formulas relates to the static structure of the loop nest rather than to its dynamic trip count, allowing our model to gain efficiency in counting cache misses by exploiting repetitive patterns of cache behavior. Validation against cache simulation confirms the exactness of our formulation. Our method can serve as the basis for a static performance predictor to guide program and data transformations to improve performance.

[1]  Keshav Pingali,et al.  Tiling Imperfectly-nested Loop Nests , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[2]  James R. Larus,et al.  Wisconsin Architectural Research Tool Set , 1993, CARN.

[3]  Sally A. McKee,et al.  Caches As Filters: A Unifying Model for Memory Hierarchy Analysis , 2000 .

[4]  Wei Li,et al.  Unifying data and control transformations for distributed shared-memory machines , 1995, PLDI '95.

[5]  Mithuna Thottethodi,et al.  Recursive array layouts and fast parallel matrix multiplication , 1999, SPAA '99.

[6]  Sally A. McKee,et al.  Caches as filters: a new approach to cache analysis , 1998, Proceedings. Sixth International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (Cat. No.98TB100247).

[7]  Keshav Pingali,et al.  Tiling Imperfectly-nested Loop Nests (REVISED) , 2000 .

[8]  Olivier Temam,et al.  Influence of cross-interferences on blocked loops: a case study with matrix-vector multiply , 1995, TOPL.

[9]  Gaetano Borriello,et al.  Symbolic timing verification of timing diagrams using Presburger formulas , 1997, DAC.

[10]  David Padua,et al.  Compile-time performance prediction of scientific programs , 2000 .

[11]  Michael Wolfe,et al.  More iteration space tiling , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).

[12]  Keshav Pingali,et al.  Locality Enhancement of Imperfectly-Nested Loop Nests , 2000 .

[13]  Chau-Wen Tseng,et al.  Compiler optimizations for improving data locality , 1994, ASPLOS VI.

[14]  Steven W. K. Tjiang,et al.  SUIF: an infrastructure for research on parallelizing and optimizing compilers , 1994, SIGP.

[15]  Derek C. Oppen,et al.  A 2^2^2^pn Upper Bound on the Complexity of Presburger Arithmetic , 1978, J. Comput. Syst. Sci..

[16]  Sharad Malik,et al.  Cache miss equations: a compiler framework for analyzing and tuning memory behavior , 1999, TOPL.

[17]  Uwe Schöning Complexity of Presburger Arithmetic with Fixed Quantifier Dimension , 1997, Theory Comput. Syst..

[18]  S. Abraham,et al.  Eecient Simulation of Multiple Cache Conngurations Using Binomial Trees , 1991 .

[19]  Reinhard Wilhelm,et al.  Cache Behavior Prediction by Abstract Interpretation , 1996, SAS.

[20]  Larry Carter,et al.  Quantifying the Multi-level Nature of Tiling Interactions , 1997, LCPC.

[21]  Graham R. Nudd,et al.  Analytical Modeling of Set-Associative Cache Behavior , 1999, IEEE Trans. Computers.

[22]  Reinhard Wilhelm,et al.  Cache Behavior Prediction by Abstract Interpretation , 1996, Sci. Comput. Program..

[23]  David A. Wood,et al.  A model for estimating trace-sample miss ratios , 1991, SIGMETRICS '91.

[24]  Sharad Malik,et al.  Precise miss analysis for program transformations with caches of arbitrary associativity , 1998, ASPLOS VIII.

[25]  David A. Wood,et al.  Active Memory: A New Abstraction for Memory System Simulation , 1997, ACM Trans. Model. Comput. Simul..

[26]  Siddhartha Chatterjee,et al.  The Combinatorics of Cache Misses during Matrix Multiplication , 2001, J. Comput. Syst. Sci..

[27]  William Pugh,et al.  The Omega Library interface guide , 1995 .

[28]  Mark Horowitz,et al.  An analytical cache model , 1989, TOCS.

[29]  Somnath Ghosh,et al.  Cache Miss Equations: Compiler Analysis Framework for Tuning Memory Behavior , 2001, PPSC.

[30]  Kathryn S. McKinley,et al.  Tile size selection using cache organization and data layout , 1995, PLDI '95.

[31]  Keshav Pingali,et al.  Automatic Generation of Block-Recursive Codes , 2000, Euro-Par.

[32]  Siddhartha Chatterjee,et al.  Cache-efficient matrix transposition , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).

[33]  Alan Jay Smith,et al.  Evaluating Associativity in CPU Caches , 1989, IEEE Trans. Computers.

[34]  Philippe Clauss,et al.  Counting solutions to linear and nonlinear constraints through Ehrhart polynomials: applications to analyze and transform scientific programs , 1996 .

[35]  Monica S. Lam,et al.  Maximizing parallelism and minimizing synchronization with affine transforms , 1997, POPL '97.

[36]  David A. Wood,et al.  Cache profiling and the SPEC benchmarks: a case study , 1994, Computer.

[37]  Margaret Martonosi,et al.  MemSpy: analyzing memory system bottlenecks in programs , 1992, SIGMETRICS '92/PERFORMANCE '92.

[38]  Alan Eustace,et al.  ATOM - A System for Building Customized Program Analysis Tools , 1994, PLDI.

[39]  William Pugh,et al.  Counting solutions to Presburger formulas: how and why , 1994, PLDI '94.

[40]  W. Pugh,et al.  A framework for unifying reordering transformations , 1993 .

[41]  Harold S. Stone,et al.  Footprints in the cache , 1987, TOCS.

[42]  Mithuna Thottethodi,et al.  Nonlinear array layouts for hierarchical memory systems , 1999, ICS '99.

[43]  Pierre Wolper,et al.  An Automata-Theoretic Approach to Presburger Arithmetic Constraints (Extended Abstract) , 1995, SAS.

[44]  Chau-Wen Tseng,et al.  Eliminating conflict misses for high performance architectures , 1998, ICS '98.

[45]  Ken Kennedy,et al.  Software methods for improvement of cache performance on supercomputer applications , 1989 .

[46]  William Pugh,et al.  Finding Legal Reordering Transformations Using Mappings , 1994, LCPC.

[47]  Volker Weispfenning,et al.  Complexity and uniformity of elimination in Presburger arithmetic , 1997, ISSAC.

[48]  Margaret Martonosi,et al.  A Mathematical Cache Miss Analysis for Pointer Data Structures , 2001, PPSC.

[49]  Monica S. Lam,et al.  The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.

[50]  Vincent Loechner PolyLib: A Library for Manipulating Parameterized Polyhedra , 1999 .

[51]  Hubert Comon-Lundh,et al.  Diophantine Equations, Presburger Arithmetic and Finite Automata , 1996, CAAP.

[52]  Jeremy D. Frens,et al.  Auto-blocking matrix-multiplication or tracking BLAS3 performance from source code , 1997, PPOPP '97.

[53]  Chau-Wen Tseng,et al.  Data transformations for eliminating conflict misses , 1998, PLDI.

[54]  Monica S. Lam,et al.  A data locality optimizing algorithm , 1991, PLDI '91.

[55]  Richard E. Kessler,et al.  Page placement algorithms for large real-indexed caches , 1992, TOCS.

[56]  Mahmut T. Kandemir,et al.  A Linear Algebra Framework for Automatic Determination of Optimal Data Layouts , 1999, IEEE Trans. Parallel Distributed Syst..

[57]  Sharad Malik,et al.  Cache miss equations: an analytical representation of cache misses , 1997, ICS '97.

[58]  Yunheung Paek,et al.  Simplification of array access patterns for compiler optimizations , 1998, PLDI.

[59]  Olivier Temam,et al.  Quantifying loop nest locality using SPEC'95 and the perfect benchmarks , 1999, TOCS.

[60]  Gaetano Borriello,et al.  Making complex timing relationships readable: Presburger formula simplification using don't cares , 1998, Proceedings 1998 Design and Automation Conference. 35th DAC. (Cat. No.98CH36175).

[61]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .