暂无分享,去创建一个
[1] James Demmel,et al. Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology , 1997, ICS '97.
[2] Robert A. van de Geijn,et al. BLIS: A Framework for Rapidly Instantiating BLAS Functionality , 2015, ACM Trans. Math. Softw..
[3] Robert A. van de Geijn,et al. Anatomy of high-performance matrix multiplication , 2008, TOMS.
[4] Jack J. Dongarra,et al. Automated empirical optimizations of software and the ATLAS project , 2001, Parallel Comput..
[5] Bo Kågström,et al. GEMM-based level 3 BLAS: high-performance model implementations and performance evaluation benchmark , 1998, TOMS.
[6] Robert A. van de Geijn,et al. Unleashing the high-performance and low-power of multi-core DSPs for general-purpose HPC , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[7] Jack J. Dongarra,et al. Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.
[8] Robert A. van de Geijn,et al. Anatomy of High-Performance Many-Threaded Matrix Multiplication , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.
[9] Jack J. Dongarra,et al. An extended set of FORTRAN basic linear algebra subprograms , 1988, TOMS.
[10] Tze Meng Low,et al. Analytical Modeling Is Enough for High-Performance BLIS , 2016, ACM Trans. Math. Softw..
[11] Charles L. Lawson,et al. Basic Linear Algebra Subprograms for Fortran Usage , 1979, TOMS.
[12] Ramesh C. Agarwal,et al. Exploiting functional parallelism of POWER2 to design high-performance numerical algorithms , 1994, IBM J. Res. Dev..
[13] Robert A. van de Geijn,et al. A Family of High-Performance Matrix Multiplication Algorithms , 2001, International Conference on Computational Science.
[14] Jack J. Dongarra,et al. A set of level 3 basic linear algebra subprograms , 1990, TOMS.
[15] Yuefan Deng,et al. New trends in high performance computing , 2001, Parallel Computing.
[16] Tze Meng Low,et al. The BLIS Framework , 2016 .