0 BLIS : A Modern Alternative to the BLAS FIELD
暂无分享,去创建一个
[1] Franz Franchetti,et al. SPIRAL: Code Generation for DSP Transforms , 2005, Proceedings of the IEEE.
[2] Jack J. Dongarra,et al. Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.
[3] Robert A. van de Geijn,et al. Formal Methods for High-Performance Linear Algebra Libraries , 2000, The Architecture of Scientific Software.
[4] Robert A. van de Geijn,et al. A Family of High-Performance Matrix Multiplication Algorithms , 2001, International Conference on Computational Science.
[5] Robert A. van de Geijn,et al. The science of deriving dense linear algebra algorithms , 2005, TOMS.
[6] Robert A. van de Geijn,et al. Designing Linear Algebra Algorithms by Transformation: Mechanizing the Expert Developer , 2012, VECPAR.
[7] Robert A. van de Geijn,et al. The libflame Library for Dense Matrix Computations , 2009, Computing in Science & Engineering.
[8] James Demmel,et al. Cache efficient bidiagonalization using BLAS 2.5 operators , 2008, TOMS.
[9] James Demmel,et al. A preliminary analysis of Cyclops Tensor Framework , 2012 .
[10] Robert A. van de Geijn,et al. Programming matrix algorithms-by-blocks for thread-level parallelism , 2009, TOMS.
[11] Robert A. van de Geijn,et al. High-performance implementation of the level-3 BLAS , 2008, TOMS.
[12] Robert A. van de Geijn,et al. Codesign Tradeoffs for High-Performance, Low-Power Linear Algebra Architectures , 2012, IEEE Transactions on Computers.
[13] Bo Kågström,et al. GEMM-based level 3 BLAS: high-performance model implementations and performance evaluation benchmark , 1998, TOMS.
[14] Charles L. Lawson,et al. Basic Linear Algebra Subprograms for Fortran Usage , 1979, TOMS.
[15] Robert A. van de Geijn,et al. SuperMatrix: a multithreaded runtime scheduling system for algorithms-by-blocks , 2008, PPoPP.
[16] Gang Ren,et al. Is Search Really Necessary to Generate High-Performance BLAS? , 2005, Proceedings of the IEEE.
[17] Jack J. Dongarra,et al. An extended set of FORTRAN basic linear algebra subprograms , 1988, TOMS.
[18] Ramesh C. Agarwal,et al. Exploiting functional parallelism of POWER2 to design high-performance numerical algorithms , 1994, IBM J. Res. Dev..
[19] Elizabeth R. Jessup,et al. Build to order linear algebra kernels , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[20] Jack J. Dongarra,et al. A set of level 3 basic linear algebra subprograms , 1990, TOMS.
[21] Robert A. van de Geijn,et al. Restructuring the Tridiagonal and Bidiagonal QR Algorithms for Performance , 2014, ACM Trans. Math. Softw..
[22] Elizabeth R. Jessup,et al. Automating the generation of composed linear algebra kernels , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[23] Robert A. van de Geijn,et al. Families of Algorithms for Reducing a Matrix to Condensed Form , 2012, TOMS.
[24] Tze Meng Low,et al. Accumulating Householder transformations, revisited , 2006, TOMS.
[25] Robert A. van de Geijn,et al. FLAME: Formal Linear Algebra Methods Environment , 2001, TOMS.
[26] Ed Anderson,et al. LAPACK Users' Guide , 1995 .
[27] James Demmel,et al. Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology , 1997, ICS '97.
[28] James Demmel,et al. LU, QR and Cholesky Factorizations using Vector Capabilities of GPUs , 2008 .
[29] Robert A. van de Geijn,et al. Anatomy of high-performance matrix multiplication , 2008, TOMS.