Exact analysis of the cache behavior of nested loops
暂无分享,去创建一个
Siddhartha Chatterjee | Alvin R. Lebeck | Philip J. Hanlon | Erin Parker | P. Hanlon | A. Lebeck | S. Chatterjee | E. Parker
[1] Keshav Pingali,et al. Tiling Imperfectly-nested Loop Nests , 2000, ACM/IEEE SC 2000 Conference (SC'00).
[2] James R. Larus,et al. Wisconsin Architectural Research Tool Set , 1993, CARN.
[3] Sally A. McKee,et al. Caches As Filters: A Unifying Model for Memory Hierarchy Analysis , 2000 .
[4] Wei Li,et al. Unifying data and control transformations for distributed shared-memory machines , 1995, PLDI '95.
[5] Mithuna Thottethodi,et al. Recursive array layouts and fast parallel matrix multiplication , 1999, SPAA '99.
[6] Sally A. McKee,et al. Caches as filters: a new approach to cache analysis , 1998, Proceedings. Sixth International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (Cat. No.98TB100247).
[7] Keshav Pingali,et al. Tiling Imperfectly-nested Loop Nests (REVISED) , 2000 .
[8] Olivier Temam,et al. Influence of cross-interferences on blocked loops: a case study with matrix-vector multiply , 1995, TOPL.
[9] Gaetano Borriello,et al. Symbolic timing verification of timing diagrams using Presburger formulas , 1997, DAC.
[10] David Padua,et al. Compile-time performance prediction of scientific programs , 2000 .
[11] Michael Wolfe,et al. More iteration space tiling , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).
[12] Keshav Pingali,et al. Locality Enhancement of Imperfectly-Nested Loop Nests , 2000 .
[13] Chau-Wen Tseng,et al. Compiler optimizations for improving data locality , 1994, ASPLOS VI.
[14] Steven W. K. Tjiang,et al. SUIF: an infrastructure for research on parallelizing and optimizing compilers , 1994, SIGP.
[15] Derek C. Oppen,et al. A 2^2^2^pn Upper Bound on the Complexity of Presburger Arithmetic , 1978, J. Comput. Syst. Sci..
[16] Sharad Malik,et al. Cache miss equations: a compiler framework for analyzing and tuning memory behavior , 1999, TOPL.
[17] Uwe Schöning. Complexity of Presburger Arithmetic with Fixed Quantifier Dimension , 1997, Theory Comput. Syst..
[18] S. Abraham,et al. Eecient Simulation of Multiple Cache Conngurations Using Binomial Trees , 1991 .
[19] Reinhard Wilhelm,et al. Cache Behavior Prediction by Abstract Interpretation , 1996, SAS.
[20] Larry Carter,et al. Quantifying the Multi-level Nature of Tiling Interactions , 1997, LCPC.
[21] Graham R. Nudd,et al. Analytical Modeling of Set-Associative Cache Behavior , 1999, IEEE Trans. Computers.
[22] Reinhard Wilhelm,et al. Cache Behavior Prediction by Abstract Interpretation , 1996, Sci. Comput. Program..
[23] David A. Wood,et al. A model for estimating trace-sample miss ratios , 1991, SIGMETRICS '91.
[24] Sharad Malik,et al. Precise miss analysis for program transformations with caches of arbitrary associativity , 1998, ASPLOS VIII.
[25] David A. Wood,et al. Active Memory: A New Abstraction for Memory System Simulation , 1997, ACM Trans. Model. Comput. Simul..
[26] Siddhartha Chatterjee,et al. The Combinatorics of Cache Misses during Matrix Multiplication , 2001, J. Comput. Syst. Sci..
[27] William Pugh,et al. The Omega Library interface guide , 1995 .
[28] Mark Horowitz,et al. An analytical cache model , 1989, TOCS.
[29] Somnath Ghosh,et al. Cache Miss Equations: Compiler Analysis Framework for Tuning Memory Behavior , 2001, PPSC.
[30] Kathryn S. McKinley,et al. Tile size selection using cache organization and data layout , 1995, PLDI '95.
[31] Keshav Pingali,et al. Automatic Generation of Block-Recursive Codes , 2000, Euro-Par.
[32] Siddhartha Chatterjee,et al. Cache-efficient matrix transposition , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).
[33] Alan Jay Smith,et al. Evaluating Associativity in CPU Caches , 1989, IEEE Trans. Computers.
[34] Philippe Clauss,et al. Counting solutions to linear and nonlinear constraints through Ehrhart polynomials: applications to analyze and transform scientific programs , 1996 .
[35] Monica S. Lam,et al. Maximizing parallelism and minimizing synchronization with affine transforms , 1997, POPL '97.
[36] David A. Wood,et al. Cache profiling and the SPEC benchmarks: a case study , 1994, Computer.
[37] Margaret Martonosi,et al. MemSpy: analyzing memory system bottlenecks in programs , 1992, SIGMETRICS '92/PERFORMANCE '92.
[38] Alan Eustace,et al. ATOM - A System for Building Customized Program Analysis Tools , 1994, PLDI.
[39] William Pugh,et al. Counting solutions to Presburger formulas: how and why , 1994, PLDI '94.
[40] W. Pugh,et al. A framework for unifying reordering transformations , 1993 .
[41] Harold S. Stone,et al. Footprints in the cache , 1987, TOCS.
[42] Mithuna Thottethodi,et al. Nonlinear array layouts for hierarchical memory systems , 1999, ICS '99.
[43] Pierre Wolper,et al. An Automata-Theoretic Approach to Presburger Arithmetic Constraints (Extended Abstract) , 1995, SAS.
[44] Chau-Wen Tseng,et al. Eliminating conflict misses for high performance architectures , 1998, ICS '98.
[45] Ken Kennedy,et al. Software methods for improvement of cache performance on supercomputer applications , 1989 .
[46] William Pugh,et al. Finding Legal Reordering Transformations Using Mappings , 1994, LCPC.
[47] Volker Weispfenning,et al. Complexity and uniformity of elimination in Presburger arithmetic , 1997, ISSAC.
[48] Margaret Martonosi,et al. A Mathematical Cache Miss Analysis for Pointer Data Structures , 2001, PPSC.
[49] Monica S. Lam,et al. The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.
[50] Vincent Loechner. PolyLib: A Library for Manipulating Parameterized Polyhedra , 1999 .
[51] Hubert Comon-Lundh,et al. Diophantine Equations, Presburger Arithmetic and Finite Automata , 1996, CAAP.
[52] Jeremy D. Frens,et al. Auto-blocking matrix-multiplication or tracking BLAS3 performance from source code , 1997, PPOPP '97.
[53] Chau-Wen Tseng,et al. Data transformations for eliminating conflict misses , 1998, PLDI.
[54] Monica S. Lam,et al. A data locality optimizing algorithm , 1991, PLDI '91.
[55] Richard E. Kessler,et al. Page placement algorithms for large real-indexed caches , 1992, TOCS.
[56] Mahmut T. Kandemir,et al. A Linear Algebra Framework for Automatic Determination of Optimal Data Layouts , 1999, IEEE Trans. Parallel Distributed Syst..
[57] Sharad Malik,et al. Cache miss equations: an analytical representation of cache misses , 1997, ICS '97.
[58] Yunheung Paek,et al. Simplification of array access patterns for compiler optimizations , 1998, PLDI.
[59] Olivier Temam,et al. Quantifying loop nest locality using SPEC'95 and the perfect benchmarks , 1999, TOCS.
[60] Gaetano Borriello,et al. Making complex timing relationships readable: Presburger formula simplification using don't cares , 1998, Proceedings 1998 Design and Automation Conference. 35th DAC. (Cat. No.98CH36175).
[61] Jeffrey D. Ullman,et al. Introduction to Automata Theory, Languages and Computation , 1979 .