Automatic Library Generation for BLAS3 on GPUs
暂无分享,去创建一个
Yang Yang | Lei Wang | Xiaobing Feng | Jingling Xue | Huimin Cui | Xiaobing Feng | Jingling Xue | Lei Wang | Huimin Cui | Yang Yang
[1] Chun Chen,et al. Loop Transformation Recipes for Code Generation and Auto-Tuning , 2009, LCPC.
[2] J. Ramanujam,et al. Automatic C-to-CUDA Code Generation for Affine Programs , 2010, CC.
[3] Samuel Williams,et al. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[4] Michael F. P. O'Boyle,et al. Using machine learning to focus iterative optimization , 2006, International Symposium on Code Generation and Optimization (CGO'06).
[5] Samuel Williams,et al. Optimization of sparse matrix-vector multiplication on emerging multicore platforms , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).
[6] Gang Ren,et al. A comparison of empirical and model-driven optimization , 2003, PLDI '03.
[7] Uday Bondhugula,et al. Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories , 2008, PPoPP.
[8] Chun Chen,et al. A scalable auto-tuning framework for compiler optimization , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[9] Jingling Xue,et al. Loop Tiling for Parallelism , 2000, Kluwer International Series in Engineering and Computer Science.
[10] David Parello,et al. Semi-Automatic Composition of Loop Transformations for Deep Parallelism and Memory Hierarchies , 2006, International Journal of Parallel Programming.
[11] William Jalby,et al. Iterative Compilation with Kernel Exploration , 2006, LCPC.
[12] Yuefan Deng,et al. New trends in high performance computing , 2001, Parallel Computing.
[13] Michael F. P. O'Boyle,et al. MILEPOST GCC: machine learning based research compiler , 2008 .
[14] David A. Padua,et al. A Language for the Compact Representation of Multiple Program Versions , 2005, LCPC.
[15] James Demmel,et al. Benchmarking GPUs to tune dense linear algebra , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[16] Matteo Frigo,et al. A fast Fourier transform compiler , 1999, SIGP.
[17] Jack J. Dongarra,et al. Automated empirical optimizations of software and the ATLAS project , 2001, Parallel Comput..
[18] Peter M. W. Knijnenburg,et al. Iterative compilation in a non-linear optimisation space , 1998 .
[19] Olivier Temam,et al. Collective Optimization , 2008, HiPEAC.
[20] Chun Chen,et al. Combining models and guided empirical search to optimize for multiple levels of the memory hierarchy , 2005, International Symposium on Code Generation and Optimization.
[21] L. Almagor,et al. Finding effective compilation sequences , 2004, LCTES '04.
[22] Yunheung Paek,et al. Finding effective optimization phase sequences , 2003, LCTES '03.
[23] Dongrui Fan,et al. Extendable pattern-oriented optimization directives , 2012, International Symposium on Code Generation and Optimization (CGO 2011).
[24] Albert Cohen,et al. A Practical Method for Quickly Evaluating Program Optimizations , 2005, HiPEAC.
[25] Michael F. P. O'Boyle,et al. A Feasibility Study in Iterative Compilation , 1999, ISHPC.
[26] Keith D. Cooper,et al. Optimizing for reduced code space using genetic algorithms , 1999, LCTES '99.
[27] Jack Dongarra,et al. Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects , 2009 .
[28] John Cavazos,et al. Inducing heuristics to decide whether to schedule , 2004, PLDI '04.
[29] Franz Franchetti,et al. SPIRAL: Code Generation for DSP Transforms , 2005, Proceedings of the IEEE.
[30] Xipeng Shen,et al. A cross-input adaptive framework for GPU program optimizations , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.