Speeding up Nek5000 with autotuning and specialization
暂无分享,去创建一个
Chun Chen | Paul D. Hovland | Jaewook Shin | Jacqueline Chame | Mary W. Hall | Paul F. Fischer | P. Hovland | Chun Chen | J. Chame | M. Hall | P. F. Fischer | Jaewook Shin
[1] A. Patera. A spectral element method for fluid dynamics: Laminar flow in a channel expansion , 1984 .
[2] Steven G. Johnson,et al. The Fastest Fourier Transform in the West , 1997 .
[3] James Demmel,et al. Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology , 1997, ICS '97.
[4] Jack J. Dongarra,et al. Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.
[5] H.M. Tufo,et al. Terascale Spectral Element Algorithms and Implementations , 1999, ACM/IEEE SC 1999 Conference (SC'99).
[6] Jack J. Dongarra,et al. A Portable Programming Interface for Performance Evaluation on Modern Processors , 2000, Int. J. High Perform. Comput. Appl..
[7] Robert A. van de Geijn,et al. High-Performance Matrix Multiplication Algorithms for Architectures withHierarchical Memories , 2001 .
[8] Robert A. van de Geijn,et al. FLAME: Formal Linear Algebra Methods Environment , 2001, TOMS.
[9] P. Fischer,et al. High-Order Methods for Incompressible Fluid Flow , 2002 .
[10] Juan J. Navarro,et al. Improving Performance of Hypermatrix Cholesky Factorization , 2003, Euro-Par.
[11] Yunheung Paek,et al. Finding effective optimization phase sequences , 2003 .
[12] Jaewook Shin,et al. Exploiting Superword-Level Locality in Multimedia Extension Architectures , 2003, J. Instr. Level Parallelism.
[13] Franz Franchetti,et al. SPIRAL: Code Generation for DSP Transforms , 2005, Proceedings of the IEEE.
[14] Katherine Yelick,et al. OSKI: A library of automatically tuned sparse matrix kernels , 2005 .
[15] Gang Ren,et al. Is Search Really Necessary to Generate High-Performance BLAS? , 2005, Proceedings of the IEEE.
[16] Chun Chen,et al. Combining models and guided empirical search to optimize for multiple levels of the memory hierarchy , 2005, International Symposium on Code Generation and Optimization.
[17] William Jalby,et al. Iterative Compilation with Kernel Exploration , 2006, LCPC.
[18] Richard W. Vuduc,et al. POET: Parameterized Optimizations for Empirical Tuning , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.
[19] Mary W. Hall,et al. CHiLL : A Framework for Composing High-Level Loop Transformations , 2007 .
[20] Chun Chen,et al. Model-guided empirical optimization for memory hierarchy , 2007 .
[21] Albert Cohen,et al. Iterative Optimization in the Polyhedral Model: Part I, One-Dimensional Time , 2007, International Symposium on Code Generation and Optimization (CGO'07).
[22] P. Fischer,et al. Petascale algorithms for reactor hydrodynamics , 2008 .
[23] Albert Cohen,et al. Iterative optimization in the polyhedral model: part ii, multidimensional time , 2008, PLDI '08.
[24] Ayal Zaks,et al. Outer-loop vectorization - revisited for short SIMD architectures , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[25] Chun Chen,et al. Loop Transformation Recipes for Code Generation and Auto-Tuning , 2009, LCPC.
[26] Markus Püschel,et al. Computer Generation of General Size Linear Transform Libraries , 2009, 2009 International Symposium on Code Generation and Optimization.
[27] P. Sadayappan,et al. Annotation-based empirical performance tuning using Orio , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[28] Chun Chen,et al. A scalable auto-tuning framework for compiler optimization , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[29] Chun Chen,et al. Autotuning and Specialization: Speeding up Matrix Multiply for Small Matrices with Compiler Technology , 2010, Software Automatic Tuning, From Concepts to State-of-the-Art Results.