Efficient and accurate analytical modeling of whole-program data cache behavior
暂无分享,去创建一个
[1] Siddhartha Chatterjee,et al. Exact analysis of the cache behavior of nested loops , 2001, PLDI '01.
[2] W. Jalby,et al. To copy or not to copy: a compile-time technique for assessing when data copying should be used to eliminate cache conflicts , 1993, Supercomputing '93.
[3] Jingling Xue,et al. Let's study whole-program cache behaviour analytically , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.
[4] Doran Wilde,et al. A LIBRARY FOR DOING POLYHEDRAL OPERATIONS , 2000 .
[5] Kathryn S. McKinley,et al. Tile size selection using cache organization and data layout , 1995, PLDI '95.
[6] Jingling Xue,et al. Reuse-Driven Tiling for Improving Data Locality , 1998, International Journal of Parallel Programming.
[7] Michael E. Wolf,et al. The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.
[8] Olivier Temam,et al. Quantifying loop nest locality using SPEC'95 and the perfect benchmarks , 1999, TOCS.
[9] Margaret Martonosi,et al. MemSpy: analyzing memory system bottlenecks in programs , 1992, SIGMETRICS '92/PERFORMANCE '92.
[10] William Pugh,et al. Counting solutions to Presburger formulas: how and why , 1994, PLDI '94.
[11] Josep Llosa,et al. A Fast and Accurate Approach to Analyze Cache Memory Behavior (Research Note) , 2000, Euro-Par.
[12] Philippe Clauss,et al. Counting solutions to linear and nonlinear constraints through Ehrhart polynomials: applications to analyze and transform scientific programs , 1996 .
[13] Christian Lengauer,et al. Loop Parallelization in the Polytope Model , 1993, CONCUR.
[14] Graham R. Nudd,et al. Analytical Modeling of Set-Associative Cache Behavior , 1999, IEEE Trans. Computers.
[15] Scott McFarling,et al. Program optimization for instruction caches , 1989, ASPLOS III.
[16] Josep Llosa,et al. Near-Optimal Padding for Removing Conflict Misses , 2002, LCPC.
[17] Paul Feautrier,et al. Automatic Parallelization in the Polytope Model , 1996, The Data Parallel Programming Model.
[18] Sharad Malik,et al. Cache miss equations: a compiler framework for analyzing and tuning memory behavior , 1999, TOPL.
[19] Emilio L. Zapata,et al. Modeling set associative caches behavior for irregular computations , 1998, SIGMETRICS '98/PERFORMANCE '98.
[20] Josep Llosa,et al. An efficient solver for Cache Miss Equations , 2000, 2000 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS (Cat. No.00EX422).
[21] Jingling Xue,et al. Loop Tiling for Parallelism , 2000, Kluwer International Series in Engineering and Computer Science.
[22] Josep Torrellas,et al. Optimizing the Instruction Cache Performance of the Operating System , 1998, IEEE Trans. Computers.
[23] P. Feautrier. Parametric integer programming , 1988 .
[24] William Pugh,et al. A practical algorithm for exact array dependence analysis , 1992, CACM.
[25] Trishul M. Chilimbi. Efficient representations and abstractions for quantifying and exploiting data reference locality , 2001, PLDI '01.
[26] M. Degroot,et al. Probability and Statistics , 2021, Examining an Operational Approach to Teaching Probability.
[27] Dennis Gannon,et al. Strategies for cache and local memory management by global program transformation , 1988, J. Parallel Distributed Comput..
[28] Monica S. Lam,et al. A data locality optimizing algorithm , 1991, PLDI '91.
[29] Richard E. Kessler,et al. Page placement algorithms for large real-indexed caches , 1992, TOCS.
[30] Mahmut T. Kandemir,et al. A Linear Algebra Framework for Automatic Determination of Optimal Data Layouts , 1999, IEEE Trans. Parallel Distributed Syst..
[31] David A. Patterson,et al. Computer architecture (2nd ed.): a quantitative approach , 1996 .
[32] Vivek Sarkar,et al. On Estimating and Enhancing Cache Effectiveness , 1991, LCPC.
[33] Urs Hölzle,et al. Eliminating Virtual Function Calls in C++ Programs , 1996, ECOOP.
[34] Chau-Wen Tseng,et al. Improving data locality with loop transformations , 1996, TOPL.
[35] Vivek Sarkar,et al. A compiler framework for restructuring data declarations to enhance cache and TLB effectiveness , 1994, CASCON.
[36] Olivier Temam,et al. Characterizing the behavior of sparse algorithms on caches , 1992, Proceedings Supercomputing '92.
[37] Mithuna Thottethodi,et al. Nonlinear array layouts for hierarchical memory systems , 1999, ICS '99.
[38] Ken Kennedy,et al. Software methods for improvement of cache performance on supercomputer applications , 1989 .
[39] Trevor N. Mudge,et al. Trace-driven memory simulation: a survey , 1997, CSUR.
[40] Steven W. K. Tjiang,et al. SUIF: an infrastructure for research on parallelizing and optimizing compilers , 1994, SIGP.
[41] Olivier Temam,et al. Cache interference phenomena , 1994, SIGMETRICS.
[42] William Pugh,et al. The Omega test: A fast and practical integer programming algorithm for dependence analysis , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).
[43] Emilio L. Zapata,et al. Automatic analytical modeling for the estimation of cache misses , 1999, 1999 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00425).
[44] Chau-Wen Tseng,et al. Data transformations for eliminating conflict misses , 1998, PLDI.
[45] Chau-Wen Tseng,et al. A Comparison of Locality Transformations for Irregular Codes , 2000, LCR.
[46] David B. Whalley,et al. Timing analysis for data caches and set-associative caches , 1997, Proceedings Third IEEE Real-Time Technology and Applications Symposium.
[47] Scott A. Mahlke,et al. Profile‐guided automatic inline expansion for C programs , 1992, Softw. Pract. Exp..
[48] Toshiaki Yasue,et al. An Empirical Study of Method In-lining for a Java Just-in-Time Compiler , 2002, Java Virtual Machine Research and Technology Symposium.
[49] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .
[50] Ken Kennedy,et al. Improving effective bandwidth through compiler enhancement of global and dynamic cache reuse , 2000 .
[51] Michael D. Smith,et al. Procedure placement using temporal-ordering information , 1999, TOPL.
[52] Zhiyuan Li,et al. New tiling techniques to improve cache temporal locality , 1999, PLDI '99.
[53] Jingling Xue,et al. Unimodular Transformations of Non-Perfectly Nested Loops , 1997, Parallel Comput..
[54] Ken Kennedy,et al. Improving effective bandwidth through compiler enhancement of global cache reuse , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.
[55] James R. Larus,et al. Exploiting hardware performance counters with flow and context sensitive profiling , 1997, PLDI '97.
[56] Ken Kennedy,et al. Improving Memory Hierarchy Performance for Irregular Applications Using Data and Computation Reorderings , 2001, International Journal of Parallel Programming.
[57] Keshav Pingali,et al. Data-centric multi-level blocking , 1997, PLDI '97.