Combining models and guided empirical search to optimize for multiple levels of the memory hierarchy
暂无分享,去创建一个
[1] Gang Ren,et al. A comparison of empirical and model-driven optimization , 2003, PLDI '03.
[2] Viktor K. Prasanna,et al. Tiling, Block Data Layout, and Memory Hierarchy Performance , 2003, IEEE Trans. Parallel Distributed Syst..
[3] W. Jalby,et al. To copy or not to copy: a compile-time technique for assessing when data copying should be used to eliminate cache conflicts , 1993, Supercomputing '93.
[4] Chau-Wen Tseng,et al. Data transformations for eliminating conflict misses , 1998, PLDI.
[5] Monica S. Lam,et al. A data locality optimizing algorithm , 1991, PLDI '91.
[6] Matteo Frigo. A Fast Fourier Transform Compiler , 1999, PLDI.
[7] Yoon-Ju Lee,et al. A Code Isolator: Isolating Code Fragments from Large Programs , 2004, LCPC.
[8] Saman P. Amarasinghe,et al. Meta optimization: improving compiler heuristics with machine learning , 2003, PLDI '03.
[9] James Demmel,et al. Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology , 1997, ICS '97.
[10] Michael E. Wolf,et al. Improving locality and parallelism in nested loops , 1992 .
[11] Michael E. Wolf,et al. The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.
[12] Siddhartha Chatterjee,et al. Exact analysis of the cache behavior of nested loops , 2001, PLDI '01.
[13] Yoon-Ju Lee,et al. Empirical Optimization for a Sparse Linear Solver: A Case Study , 2005, International Journal of Parallel Programming.
[14] Michael E. Wolf,et al. Combining Loop Transformations Considering Caches and Scheduling , 2004, International Journal of Parallel Programming.
[15] Gang Ren,et al. Is Search Really Necessary to Generate High-Performance BLAS? , 2005, Proceedings of the IEEE.
[16] Paul N. Hilfinger,et al. Better Tiling and Array Contraction for Compiling Scientific Programs , 2002, ACM/IEEE SC 2002 Conference (SC'02).
[17] Sharad Malik,et al. Precise miss analysis for program transformations with caches of arbitrary associativity , 1998, ASPLOS VIII.
[18] Vivek Sarkar,et al. A compiler framework for restructuring data declarations to enhance cache and TLB effectiveness , 1994, CASCON.
[19] Jack J. Dongarra,et al. A Portable Programming Interface for Performance Evaluation on Modern Processors , 2000, Int. J. High Perform. Comput. Appl..
[20] B. Singer,et al. Stochastic Search for Signal Processing Algorithm Optimization , 2001, ACM/IEEE SC 2001 Conference (SC'01).
[21] Yuefan Deng,et al. New trends in high performance computing , 2001, Parallel Computing.
[22] Kathryn S. McKinley,et al. Tile size selection using cache organization and data layout , 1995, PLDI '95.
[23] Larry Carter,et al. Quantifying the Multi-Level Nature of Tiling Interactions , 1997, International Journal of Parallel Programming.
[24] Todd C. Mowry,et al. Compiler-directed page coloring for multiprocessors , 1996, ASPLOS VII.
[25] David A. Padua,et al. SPL: a language and compiler for DSP algorithms , 2001, PLDI '01.
[26] Ken Kennedy,et al. Improving the ratio of memory operations to floating-point operations in loops , 1994, TOPL.
[27] Josep Llosa,et al. Optimizing program locality through CMEs and GAs , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.
[28] Chun Chen,et al. A Systematic Approach to Model-Guided Empirical Search for Memory Hierarchy Optimization , 2005, LCPC.
[29] Michael F. P. O'Boyle,et al. The effect of cache models on iterative compilation for combined tiling and unrolling , 2004, Concurr. Comput. Pract. Exp..
[30] Keith D. Cooper,et al. Optimizing for reduced code space using genetic algorithms , 1999, LCTES '99.
[31] Michael F. P. O'Boyle,et al. The effect of cache models on iterative compilation for combined tiling and unrolling: Research Articles , 2004 .