Compiler-assisted performance tuning
暂无分享,去创建一个
Robert F. Lucas | Pedro C. Diniz | Jacqueline Chame | Mary W. Hall | Chun Chen | Mary Hall | Yoonju Lee Nelson | P. Diniz | Chun Chen | R. Lucas | Jacqueline Chame | Yoon-Ju Lee Nelson
[1] Chun Chen,et al. A Systematic Approach to Model-Guided Empirical Search for Memory Hierarchy Optimization , 2005, LCPC.
[2] Larry Carter,et al. Quantifying the Multi-Level Nature of Tiling Interactions , 1997, International Journal of Parallel Programming.
[3] Ken Kennedy,et al. Transforming Complex Loop Nests for Locality , 2004, The Journal of Supercomputing.
[4] Chun Chen,et al. Model-Guided Empirical Optimization for Multimedia Extension Architectures: A Case Study , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.
[5] Yoon-Ju Lee,et al. Empirical Optimization for a Sparse Linear Solver: A Case Study , 2005, International Journal of Parallel Programming.
[6] Yuefan Deng,et al. New trends in high performance computing , 2001, Parallel Computing.
[7] Sharad Malik,et al. Precise miss analysis for program transformations with caches of arbitrary associativity , 1998, ASPLOS VIII.
[8] Antoine Petitet,et al. Minimizing development and maintenance costs in supporting persistently optimized BLAS , 2005 .
[9] Kathryn S. McKinley,et al. Tile size selection using cache organization and data layout , 1995, PLDI '95.
[10] Keshav Pingali,et al. Think globally, search locally , 2005, ICS '05.
[11] Robert A. van de Geijn,et al. High-performance implementation of the level-3 BLAS , 2008, TOMS.
[12] Michael E. Wolf,et al. Combining Loop Transformations Considering Caches and Scheduling , 2004, International Journal of Parallel Programming.
[13] Ayal Zaks,et al. Auto-vectorization of interleaved data for SIMD , 2006, PLDI '06.
[14] I-Hsin Chung,et al. A Case Study Using Automatic Performance Tuning for Large-Scale Scientific Programs , 2006, 2006 15th IEEE International Conference on High Performance Distributed Computing.
[15] Gang Ren,et al. Is Search Really Necessary to Generate High-Performance BLAS? , 2005, Proceedings of the IEEE.
[16] Gang Ren,et al. A comparison of empirical and model-driven optimization , 2003, PLDI '03.
[17] Ken Kennedy,et al. Profitable loop fusion and tiling using model-driven empirical search , 2006, ICS '06.
[18] Marta Jiménez,et al. Register tiling in nonrectangular iteration spaces , 2002, TOPL.
[19] Chau-Wen Tseng,et al. Data transformations for eliminating conflict misses , 1998, PLDI.
[20] Franz Franchetti,et al. SPIRAL: Code Generation for DSP Transforms , 2005, Proceedings of the IEEE.
[21] Ken Kennedy,et al. Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .
[22] Chun Chen,et al. Combining models and guided empirical search to optimize for multiple levels of the memory hierarchy , 2005, International Symposium on Code Generation and Optimization.
[23] Gang Ren,et al. Optimizing data permutations for SIMD devices , 2006, PLDI '06.
[24] Yi Wang,et al. A Combined Hardware/Software Optimization Framework for Signal Representation and Recognition , 2007, International Conference on Computational Science.
[25] Chun Chen,et al. Model-guided empirical optimization for memory hierarchy , 2007 .
[26] Monica S. Lam,et al. A data locality optimizing algorithm , 1991, PLDI '91.
[27] Robert J. Fowler,et al. HPCVIEW: A Tool for Top-down Analysis of Node Performance , 2002, The Journal of Supercomputing.
[28] Richard W. Vuduc,et al. POET: Parameterized Optimizations for Empirical Tuning , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.
[29] Chun Chen,et al. An overview of the ECO project , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.
[30] Ken Kennedy,et al. Improving the ratio of memory operations to floating-point operations in loops , 1994, TOPL.
[31] Chun Chen,et al. Intelligent Optimization of Parallel and Distributed Applications , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.
[32] Siddhartha Chatterjee,et al. Exact analysis of the cache behavior of nested loops , 2001, PLDI '01.
[33] Steven G. Johnson,et al. The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.