Towards a static cache analysis for whole program analysis
暂无分享,去创建一个
[1] Siddhartha Chatterjee,et al. Exact analysis of the cache behavior of nested loops , 2001, PLDI '01.
[2] Vivek Sarkar,et al. On Estimating and Enhancing Cache Effectiveness , 1991, LCPC.
[3] Ken Kennedy,et al. Compiler blockability of numerical algorithms , 1992, Proceedings Supercomputing '92.
[4] Dionisios N. Pnevmatikatos,et al. Cache performance of the SPEC92 benchmark suite , 1993, IEEE Micro.
[5] Michael Wolfe,et al. High performance compilers for parallel computing , 1995 .
[6] James R. Larus,et al. Exploiting hardware performance counters with flow and context sensitive profiling , 1997, PLDI '97.
[7] William Jalby,et al. XOR-Schemes: A Flexible Data Organization in Parallel Memories , 1985, ICPP.
[8] José González,et al. The design and performance of a conflict-avoiding cache , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.
[9] John L. Hennessy,et al. The accuracy of trace-driven simulations of multiprocessors , 1993, SIGMETRICS '93.
[10] Sharad Malik,et al. Cache miss equations: an analytical representation of cache misses , 1997, ICS '97.
[11] Elana D. Granston,et al. A Cache Visualization Tool , 1997, Computer.
[12] Olivier Temam,et al. A quantitative analysis of loop nest locality , 1996, ASPLOS VII.
[13] Keshav Pingali,et al. Data-centric multi-level blocking , 1997, PLDI '97.
[14] Ken Kennedy,et al. Analyzing and visualizing performance of memory hierarchies , 1990 .
[15] Jingling Xue,et al. Loop Tiling for Parallelism , 2000, Kluwer International Series in Engineering and Computer Science.
[16] Reinhard Wilhelm,et al. Efficient and Precise Cache Behavior Prediction for Real-Time Systems , 1999, Real-Time Systems.
[17] David T. Harper,et al. Vector Access Performance in Parallel Memories Using a Skewed Storage Scheme , 1987, IEEE Transactions on Computers.
[18] David B. Whalley,et al. Bounding worst-case instruction cache performance , 1994, 1994 Proceedings Real-Time Systems Symposium.
[19] Walter L. Smith. Probability and Statistics , 1959, Nature.
[20] John L. Hennessy,et al. Performance debugging shared memory multiprocessor programs with MTOOL , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).
[21] Zhiyuan Li,et al. New tiling techniques to improve cache temporal locality , 1999, PLDI '99.
[22] David E. Goldberg,et al. Genetic Algorithms in Search Optimization and Machine Learning , 1988 .
[23] Duncan H. Lawrie,et al. The Prime Memory System for Array Access , 1982, IEEE Transactions on Computers.
[24] Monica S. Lam,et al. A data locality optimizing algorithm , 1991, PLDI '91.
[25] Sharad Malik,et al. Efficient microarchitecture modeling and path analysis for real-time software , 1995, Proceedings 16th IEEE Real-Time Systems Symposium.
[26] Olivier Temam,et al. Cache interference phenomena , 1994, SIGMETRICS.
[27] Sally A. McKee,et al. Caches As Filters: A Unifying Model for Memory Hierarchy Analysis , 2000 .
[28] Josep Llosa,et al. Near-Optimal Padding for Removing Conflict Misses , 2002, LCPC.
[29] Trevor N. Mudge,et al. Trace-driven memory simulation: a survey , 1997, CSUR.
[30] Sharad Malik,et al. Cache miss equations: a compiler framework for analyzing and tuning memory behavior , 1999, TOPL.
[31] Constantine D. Polychronopoulos,et al. Symbolic Analysis: A Basis for Parallelization, Optimization, and Scheduling of Programs , 1993, LCPC.
[32] Josep Llosa,et al. Near-optimal loop tiling by means of cache miss equations and genetic algorithms , 2002, Proceedings. International Conference on Parallel Processing Workshop.
[33] Robert C. Bedichek. Talisman: fast and accurate multicomputer simulation , 1995, SIGMETRICS '95/PERFORMANCE '95.
[34] David B. Whalley,et al. Integrating the timing analysis of pipelining and instruction caching , 1995, Proceedings 16th IEEE Real-Time Systems Symposium.
[35] Mendel Rosenblum,et al. Embra: fast and flexible machine simulation , 1996, SIGMETRICS '96.
[36] W. Jalby,et al. To copy or not to copy: a compile-time technique for assessing when data copying should be used to eliminate cache conflicts , 1993, Supercomputing '93.
[37] B. Ramakrishna Rau,et al. Pseudo-randomly interleaved memory , 1991, ISCA '91.
[38] Jingling Xue,et al. Let's study whole-program cache behaviour analytically , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.
[39] Margaret Martonosi,et al. MemSpy: analyzing memory system bottlenecks in programs , 1992, SIGMETRICS '92/PERFORMANCE '92.
[40] Josep Llosa,et al. Optimizing cache miss equations polyhedra , 2000, CARN.
[41] William Pugh,et al. Counting solutions to Presburger formulas: how and why , 1994, PLDI '94.
[42] Olivier Temam,et al. To copy or not to copy: A compile-time technique for assessing when data copying should be used to eliminate cache conflicts , 1993, Supercomputing '93. Proceedings.
[43] Monica S. Lam,et al. The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.
[44] Chau-Wen Tseng,et al. Compiler optimizations for improving data locality , 1994, ASPLOS VI.
[45] Ken Kennedy,et al. Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution , 1993, LCPC.
[46] Yuri Ermoliev,et al. Numerical techniques for stochastic optimization , 1988 .
[47] Philippe Clauss,et al. Counting solutions to linear and nonlinear constraints through Ehrhart polynomials: applications to analyze and transform scientific programs , 1996 .
[48] Sang Lyul Min,et al. An accurate worst case timing analysis technique for RISC processors , 1994, 1994 Proceedings Real-Time Systems Symposium.
[49] Sang Lyul Min,et al. Efficient worst case timing analysis of data caching , 1996, Proceedings Real-Time Technology and Applications.
[50] Chau-Wen Tseng,et al. Eliminating conflict misses for high performance architectures , 1998, ICS '98.
[51] Ken Kennedy,et al. Software methods for improvement of cache performance on supercomputer applications , 1989 .
[52] Mahmut T. Kandemir,et al. Improving locality using loop and data transformations in an integrated framework , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.
[53] Jingling Xue,et al. Unimodular Transformations of Non-Perfectly Nested Loops , 1997, Parallel Comput..
[54] Peter S. Magnusson. A Design for Efficient Simulation of a Multiprocessor , 1993, MASCOTS.
[55] Michael E. Wolf,et al. Improving locality and parallelism in nested loops , 1992 .
[56] William Pugh,et al. The Omega test: A fast and practical integer programming algorithm for dependence analysis , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).
[57] Walid Abu-Sufah,et al. Improving the performance of virtual memory computers. , 1979 .
[58] Emilio L. Zapata,et al. Automatic analytical modeling for the estimation of cache misses , 1999, 1999 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00425).
[59] Chau-Wen Tseng,et al. Data transformations for eliminating conflict misses , 1998, PLDI.
[60] Michael Wolfe,et al. Advanced Loop Interchanging , 1986, ICPP.
[61] Sharad Malik,et al. Automated cache optimizations using CME driven diagnosis , 2000, ICS '00.
[62] Josep Llosa,et al. An efficient solver for Cache Miss Equations , 2000, 2000 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS (Cat. No.00EX422).
[63] Reinhard Wilhelm,et al. Cache Behavior Prediction by Abstract Interpretation , 1996, Sci. Comput. Program..
[64] Sharad Malik,et al. Precise miss analysis for program transformations with caches of arbitrary associativity , 1998, ASPLOS VIII.
[65] Kathryn S. McKinley,et al. Tile size selection using cache organization and data layout , 1995, PLDI '95.
[66] Jingling Xue,et al. Reuse-Driven Tiling for Data Locality , 1997, LCPC.
[67] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.
[68] François Irigoin,et al. Supernode partitioning , 1988, POPL '88.
[69] David B. Whalley,et al. Timing analysis for data caches and set-associative caches , 1997, Proceedings Third IEEE Real-Time Technology and Applications Symposium.
[70] Zbigniew Michalewicz,et al. Genetic Algorithms + Data Structures = Evolution Programs , 1996, Springer Berlin Heidelberg.
[71] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .
[72] Doran Wilde,et al. A LIBRARY FOR DOING POLYHEDRAL OPERATIONS , 2000 .
[73] David A. Wood,et al. Cache profiling and the SPEC benchmarks: a case study , 1994, Computer.