Automated transformation for performance-critical kernels
暂无分享,去创建一个
[1] David E. Bernholdt,et al. Synthesis of High-Performance Parallel Programs for a Class of ab Initio Quantum Chemistry Models , 2005, Proceedings of the IEEE.
[2] Robert A. van de Geijn,et al. The science of deriving dense linear algebra algorithms , 2005, TOMS.
[3] Ken Kennedy,et al. Automatic tuning of whole applications using direct search and a performance-based transformation system , 2006, The Journal of Supercomputing.
[4] Ken Kennedy,et al. A Cache-Conscious Profitability Model for Empirical Tuning of Loop Fusion , 2005, LCPC.
[5] Franz Franchetti,et al. SPIRAL: Code Generation for DSP Transforms , 2005, Proceedings of the IEEE.
[6] Mark Stephenson,et al. Predicting unroll factors using supervised classification , 2005, International Symposium on Code Generation and Optimization.
[7] Paul N. Hilfinger,et al. Better Tiling and Array Contraction for Compiling Scientific Programs , 2002, ACM/IEEE SC 2002 Conference (SC'02).
[8] Magne Haveraaen,et al. Design of the CodeBoost transformation system for domain-specific optimisation of C++ programs , 2003, Proceedings Third IEEE International Workshop on Source Code Analysis and Manipulation.
[9] Qing Yi,et al. Parameterizing loop fusion for automated empirical tuning , 2005 .
[10] David Parello,et al. Facilitating the search for compositions of program transformations , 2005, ICS '05.
[11] Gang Ren,et al. Is Search Really Necessary to Generate High-Performance BLAS? , 2005, Proceedings of the IEEE.
[12] James Demmel,et al. Statistical Models for Automatic Performance Tuning , 2001, International Conference on Computational Science.
[13] Eelco Visser,et al. A survey of strategies in rule-based program transformation systems , 2005, J. Symb. Comput..
[14] Paul H. J. Kelly,et al. Runtime Code Generation in C++ as a Foundation for Domain-Specific Optimisation , 2003, Domain-Specific Program Generation.
[15] David A. Padua,et al. A Language for the Compact Representation of Multiple Program Versions , 2005, LCPC.
[16] R. C. Whaley,et al. Timing high performance kernels through empirical compilation , 2005, 2005 International Conference on Parallel Processing (ICPP'05).
[17] Albert Cohen,et al. A Practical Method for Quickly Evaluating Program Optimizations , 2005, HiPEAC.
[18] Dennis Gannon,et al. Active Libraries: Rethinking the roles of compilers and libraries , 1998, ArXiv.
[19] Chun Chen,et al. Combining models and guided empirical search to optimize for multiple levels of the memory hierarchy , 2005, International Symposium on Code Generation and Optimization.
[20] Steven G. Johnson,et al. The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.
[21] Ken Kennedy,et al. Transforming Complex Loop Nests for Locality , 2004, The Journal of Supercomputing.
[22] R. C. Whaley,et al. Minimizing development and maintenance costs in supporting persistently optimized BLAS , 2005, Softw. Pract. Exp..
[23] Michael F. P. O'Boyle,et al. Combined Selection of Tile Sizes and Unroll Factors Using Iterative Compilation , 2004, The Journal of Supercomputing.
[24] Yuefan Deng,et al. New trends in high performance computing , 2001, Parallel Computing.
[25] James Demmel,et al. Statistical Models for Empirical Search-Based Performance Tuning , 2004, Int. J. High Perform. Comput. Appl..
[26] Richard W. Vuduc,et al. POET: Parameterized Optimizations for Empirical Tuning , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.
[27] Markus Mock,et al. DyC: an expressive annotation-directed dynamic compiler for C , 2000, Theor. Comput. Sci..
[28] Peter Sestoft,et al. Partial evaluation and automatic program generation , 1993, Prentice Hall international series in computer science.
[29] Victor Eijkhout,et al. Self-Adapting Linear Algebra Algorithms and Software , 2005, Proceedings of the IEEE.
[30] Antoine Petitet,et al. Minimizing development and maintenance costs in supporting persistently optimized BLAS , 2005 .
[31] Jack J. Dongarra,et al. Automated empirical optimizations of software and the ATLAS project , 2001, Parallel Comput..
[32] Dawson R. Engler,et al. C: a language for high-level, efficient, and machine-independent dynamic code generation , 1995, POPL '96.
[33] Steven G. Johnson,et al. FFTW: an adaptive software architecture for the FFT , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).