A Cost Model For Integrated Restructuring Optimizations
暂无分享,去创建一个
Sally A. McKee | Wilson C. Hsieh | John B. Carter | Bharat Chandramouli | J. Carter | S. Mckee | Bharat Chandramouli
[1] John B. Carter,et al. Efficient remapping mechanisms for an adaptable memory system , 2002 .
[2] Irvin D. Rutman,et al. Remains to be seen. , 1995 .
[3] Sarita V. Adve,et al. RSIM Reference Manual: Version 1.0 , 1997 .
[4] Ken Kennedy,et al. Improving register allocation for subscripted variables , 1990, PLDI '90.
[5] David F. Bacon,et al. Compiler transformations for high-performance computing , 1994, CSUR.
[6] John Zahorjan,et al. Array restructuring for cache locality , 1996 .
[7] Jeremy D. Frens,et al. Auto-blocking matrix-multiplication or tracking BLAS3 performance from source code , 1997, PPOPP '97.
[8] Monica S. Lam,et al. A data locality optimizing algorithm , 1991, PLDI '91.
[9] Ulrich Kremer,et al. NP-completeness of Dynamic Remapping , 1993 .
[10] Rafael H. Saavedra-Barrera,et al. Machine Characterization and Benchmark Performance Prediction , 1988 .
[11] Erik Brunvand,et al. Impulse: building a smarter memory controller , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.
[12] Wei Li,et al. Unifying data and control transformations for distributed shared-memory machines , 1995, PLDI '95.
[13] Chau-Wen Tseng,et al. Improving data locality with loop transformations , 1996, TOPL.
[14] Charles E. Leiserson,et al. Cache-Oblivious Algorithms , 2003, CIAC.
[15] John Zahorjan,et al. Optimizing Data Locality by Array Restructuring , 1995 .
[16] Leigh Stoller,et al. Increasing TLB reach using superpages backed by shadow memory , 1998, ISCA.
[17] Monica S. Lam,et al. The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.
[18] Mahmut T. Kandemir,et al. Optimizing inter-nest data locality , 2002, CASES '02.
[19] Mithuna Thottethodi,et al. Nonlinear array layouts for hierarchical memory systems , 1999, ICS '99.
[20] James R. Larus,et al. Cache-conscious structure layout , 1999, PLDI '99.
[21] Mark D. Hill,et al. Surpassing the TLB performance of superpages with less operating system support , 1994, ASPLOS VI.
[22] Mahmut T. Kandemir,et al. Improving locality using loop and data transformations in an integrated framework , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.
[23] Lixin Zhang. URSIM Reference Manual , 1999 .
[24] Kathryn S. McKinley,et al. Compiling for the Impulse memory controller , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.
[25] Sally A. McKee,et al. A cost framework for evaluating integrated restructuring optimizations , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.
[26] James R. Larus,et al. EEL: machine-independent executable editing , 1995, PLDI '95.
[27] Sally A. McKee,et al. Memory system support for image processing , 1999, 1999 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00425).
[28] F. H. Mcmahon,et al. The Livermore Fortran Kernels: A Computer Test of the Numerical Performance Range , 1986 .
[29] Mahmut T. Kandemir,et al. A hyperplane based approach for optimizing spatial locality in loop nests , 1998, ICS '98.
[30] Sarita V. Adve,et al. RSIM: An Execution-Driven Simulator for ILP-Based Shared-Memory Multiprocessors and Uniprocessors , 1997 .
[31] Chau-Wen Tseng,et al. Compiler optimizations for improving data locality , 1994, ASPLOS VI.
[32] Anoop Gupta,et al. Design and evaluation of a compiler algorithm for prefetching , 1992, ASPLOS V.
[33] Ken Kennedy,et al. Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution , 1993, LCPC.
[34] Sharad Malik,et al. Precise miss analysis for program transformations with caches of arbitrary associativity , 1998, ASPLOS VIII.
[35] Alan Jay Smith,et al. Measuring Cache and TLB Performance and Their Effect on Benchmark Runtimes , 1995, IEEE Trans. Computers.