AUGEM: Automatically generate high performance Dense Linear Algebra kernels on x86 CPUs
暂无分享,去创建一个
Qian Wang | Qing Yi | Xianyi Zhang | Yunquan Zhang | Qing Yi | Xianyi Zhang | Yunquan Zhang | Qian Wang
[1] Qing Yi,et al. Layout-oblivious compiler optimization for matrix computations , 2013, TACO.
[2] James Demmel,et al. Communication-optimal parallel algorithm for strassen's matrix multiplication , 2012, SPAA '12.
[3] Apan Qasem,et al. Exploring the Optimization Space of Dense Linear Algebra Kernels , 2008, LCPC.
[4] Chun Chen,et al. Combining models and guided empirical search to optimize for multiple levels of the memory hierarchy , 2005, International Symposium on Code Generation and Optimization.
[5] Qing Yi,et al. Automated programmable control and parameterization of compiler optimizations , 2011, International Symposium on Code Generation and Optimization (CGO 2011).
[6] Elizabeth R. Jessup,et al. Automating the generation of composed linear algebra kernels , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[7] Jack J. Dongarra,et al. Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.
[8] Elizabeth R. Jessup,et al. Build to order linear algebra kernels , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[9] Richard W. Vuduc,et al. POET: Parameterized Optimizations for Empirical Tuning , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.
[10] Jack J. Dongarra,et al. Automated empirical optimizations of software and the ATLAS project , 2001, Parallel Comput..
[11] James Demmel,et al. Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology , 1997, ICS '97.
[12] Zhang Yunquan,et al. Model-driven Level 3 BLAS Performance Optimization on Loongson 3A Processor , 2012, ICPADS.
[13] Alexandru Nicolau,et al. Adaptive Strassen's matrix multiplication , 2007, ICS '07.
[14] Robert A. van de Geijn,et al. Anatomy of high-performance matrix multiplication , 2008, TOMS.
[15] Qing Yi,et al. POET: a scripting language for applying parameterized source‐to‐source program transformations , 2012, Softw. Pract. Exp..
[16] Dongrui Fan,et al. Extendable pattern-oriented optimization directives , 2012, International Symposium on Code Generation and Optimization (CGO 2011).
[17] Yuefan Deng,et al. New trends in high performance computing , 2001, Parallel Computing.
[18] Keith D. Cooper,et al. Engineering a Compiler , 2003 .
[19] Robert A. van de Geijn,et al. High-performance implementation of the level-3 BLAS , 2008, TOMS.
[20] Yang Yang,et al. Automatic Library Generation for BLAS3 on GPUs , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.