Implementing Strassen's Algorithm with BLIS
暂无分享,去创建一个
Jianyu Huang | Tyler M. Smith | G. Henry | R. Geijn | Jianyu Huang | T. Smith | Robert A. van de Geijn | Greg M. Henry
[1] Michael A. Heroux,et al. GEMMW: A Portable Level 3 BLAS Winograd Variant of Strassen's Matrix-Matrix Multiply Algorithm , 1994, Journal of Computational Physics.
[2] Shmuel Winograd,et al. On multiplication of 2 × 2 matrices , 1971 .
[3] Jianyu Huang,et al. Performance optimization for the k-nearest neighbors kernel on x86 architectures , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.
[4] Jean-Guillaume Dumas,et al. Memory efficient scheduling of Strassen-Winograd's matrix multiplication algorithm , 2007, ISSAC '09.
[5] Robert A. van de Geijn,et al. A High Performance Parallel Strassen Implementation , 1995, Parallel Process. Lett..
[6] Austin R. Benson,et al. A framework for practical parallel fast matrix multiplication , 2014, PPoPP.
[7] Jack J. Dongarra,et al. A set of level 3 basic linear algebra subprograms , 1990, TOMS.
[8] Robert A. van de Geijn,et al. Anatomy of high-performance matrix multiplication , 2008, TOMS.
[9] Robert A. van de Geijn,et al. Anatomy of High-Performance Many-Threaded Matrix Multiplication , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.
[10] Jack Dongarra,et al. ScaLAPACK: a scalable linear algebra library for distributed memory concurrent computers , 1992, [Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation.
[11] Arnold Schönhage,et al. Partial and Total Matrix Multiplication , 1981, SIAM J. Comput..
[12] Wei Huang,et al. Design of High Performance MVAPICH2: MPI2 over InfiniBand , 2006, Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06).
[13] Robert A. van de Geijn,et al. High-performance implementation of the level-3 BLAS , 2008, TOMS.
[14] Field G. Van Zee,et al. Implementing High-performance Complex Matrix Multiplication via the 3m and 4m Methods , 2017, ACM Trans. Math. Softw..
[15] J. R. Johnson,et al. Implementation of Strassen's Algorithm for Matrix Multiplication , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.
[16] Oded Schwartz,et al. Improving the Numerical Stability of Fast Matrix Multiplication , 2015, SIAM J. Matrix Anal. Appl..
[17] James Demmel,et al. Fast matrix multiplication is stable , 2006, Numerische Mathematik.
[18] Robert A. van de Geijn,et al. SUMMA: scalable universal matrix multiplication algorithm , 1995, Concurr. Pract. Exp..
[19] Robert A. van de Geijn,et al. BLIS: A Framework for Rapidly Instantiating BLAS Functionality , 2015, ACM Trans. Math. Softw..
[20] V. Strassen. Gaussian elimination is not optimal , 1969 .
[21] A. Smirnov,et al. The bilinear complexity and practical algorithms for matrix multiplication , 2013 .
[22] Alexandru Nicolau,et al. Exploiting parallelism in matrix-computation kernels for symmetric multiprocessor systems: Matrix-multiplication and matrix-addition algorithm optimizations by software pipelining and threads allocation , 2011, TOMS.
[23] James Demmel,et al. Communication-Avoiding Parallel Strassen: Implementation and performance , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.