A case for a working-set-based memory hierarchy
暂无分享,去创建一个
[1] Jaejin Lee,et al. Using prime numbers for cache indexing to eliminate conflict misses , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).
[2] Chau-Wen Tseng,et al. A Comparison of Compiler Tiling Algorithms , 1999, CC.
[3] Mahmut T. Kandemir,et al. Improving locality using loop and data transformations in an integrated framework , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.
[4] Zhiyuan Li,et al. New tiling techniques to improve cache temporal locality , 1999, PLDI '99.
[5] Qing Yang,et al. A novel cache design for vector processing , 1992, ISCA '92.
[6] Michael E. Wolf,et al. The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.
[7] Chau-Wen Tseng,et al. Data transformations for eliminating conflict misses , 1998, PLDI.
[8] Sharad Malik,et al. Precise miss analysis for program transformations with caches of arbitrary associativity , 1998, ASPLOS VIII.
[9] Chau-Wen Tseng,et al. Improving data locality with loop transformations , 1996, TOPL.
[10] Vivek Sarkar,et al. Optimization of array accesses by collective loop transformations , 1991, ICS '91.
[11] Dennis Gannon,et al. Strategies for cache and local memory management by global program transformation , 1988, J. Parallel Distributed Comput..
[12] Steve Carr,et al. Compiler blockability of dense matrix factorizations , 1997, TOMS.
[13] José González,et al. The design and performance of a conflict-avoiding cache , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.
[14] V. Sarkar,et al. Collective Loop Fusion for Array Contraction , 1992, LCPC.
[15] Sharad Malik,et al. Cache miss equations: an analytical representation of cache misses , 1997, ICS '97.
[16] François Bodin,et al. Skewed associativity enhances performance predictability , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.
[17] Ken Kennedy,et al. Improving register allocation for subscripted variables , 1990, SIGP.
[18] Ken Kennedy. Fast greedy weighted fusion , 2000, ICS '00.
[19] Kathryn S. McKinley,et al. Tile size selection using cache organization and data layout , 1995, PLDI '95.
[20] Mahmut T. Kandemir,et al. A compiler algorithm for optimizing locality in loop nests , 1997, ICS '97.
[21] Ken Kennedy,et al. Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution , 1993, LCPC.
[22] Ken Kennedy,et al. Vector Register Allocation , 1992, IEEE Trans. Computers.
[23] Olivier Temam,et al. A quantitative analysis of loop nest locality , 1996, ASPLOS VII.
[24] Monica S. Lam,et al. A data locality optimizing algorithm , 1991, PLDI '91.
[25] Josep Llosa,et al. Optimizing program locality through CMEs and GAs , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.