Capturing the Expert: Generating Fast Matrix-Multiply Kernels with Spiral
暂无分享,去创建一个
[1] Franz Franchetti,et al. Operator Language: A Program Generation Framework for Fast Kernels , 2009, DSL.
[2] Franz Franchetti,et al. SPIRAL: Code Generation for DSP Transforms , 2005, Proceedings of the IEEE.
[3] Monica S. Lam,et al. RETROSPECTIVE : Software Pipelining : An Effective Scheduling Technique for VLIW Machines , 1998 .
[4] Robert A. van de Geijn,et al. Anatomy of high-performance matrix multiplication , 2008, TOMS.
[5] Robert A. van de Geijn,et al. BLIS: A Framework for Rapidly Instantiating BLAS Functionality , 2015, ACM Trans. Math. Softw..
[6] Bryan Marker. Design by transformation : from domain knowledge to optimized program generation , 2014 .
[7] Qian Wang,et al. AUGEM: Automatically generate high performance Dense Linear Algebra kernels on x86 CPUs , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[8] Robert A. van de Geijn,et al. Code Generation and Optimization of Distributed-Memory Dense Linear Algebra Kernels , 2013, ICCS.
[9] Markus Püschel,et al. A Basic Linear Algebra Compiler , 2014, CGO '14.
[10] Franz Franchetti,et al. Formal loop merging for signal transforms , 2005, PLDI '05.
[11] Jack J. Dongarra,et al. Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.
[12] Elizabeth R. Jessup,et al. Build to order linear algebra kernels , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.