Exploiting parallelism in matrix-computation kernels for symmetric multiprocessor systems: Matrix-multiplication and matrix-addition algorithm optimizations by software pipelining and threads allocation
暂无分享,去创建一个
[1] Nicholas J. Higham,et al. Exploiting fast matrix multiplication within the level 3 BLAS , 1990, TOMS.
[2] Alexandru Nicolau,et al. Adaptive Strassen's matrix multiplication , 2007, ICS '07.
[3] Joseph JáJá,et al. An Introduction to Parallel Algorithms , 1992 .
[4] Victor Eijkhout,et al. Self-Adapting Linear Algebra Algorithms and Software , 2005, Proceedings of the IEEE.
[5] Jean-Guillaume Dumas,et al. Memory efficient scheduling of Strassen-Winograd's matrix multiplication algorithm , 2007, ISSAC '09.
[6] Michael Rodeh,et al. Matrix Multiplication: A Case Study of Algorithm Engineering , 1998, WAE.
[7] R. C. Whaley,et al. Minimizing development and maintenance costs in supporting persistently optimized BLAS , 2005, Softw. Pract. Exp..
[8] Jeremy D. Frens,et al. Auto-blocking matrix-multiplication or tracking BLAS3 performance from source code , 1997, PPOPP '97.
[9] Christopher Umans. Group-theoretic algorithms for matrix multiplication , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).
[10] M.,et al. Strassen ' s Algorithm for Matrix Multiplication : Modeling , Analysis , and ImplementationSteven , 1996 .
[11] Franz Franchetti,et al. SPIRAL: Code Generation for DSP Transforms , 2005, Proceedings of the IEEE.
[12] Nicholas J. Higham,et al. INVERSE PROBLEMS NEWSLETTER , 1991 .
[13] Robert A. van de Geijn,et al. Anatomy of high-performance matrix multiplication , 2008, TOMS.
[14] Marco Bodrato,et al. A Strassen-like matrix multiplication suited for squaring and higher power computation , 2010, ISSAC.
[15] J. Demmel,et al. Sun Microsystems , 1996 .
[16] James Demmel,et al. Fast matrix multiplication is stable , 2006, Numerische Mathematik.
[17] I. Kaporin. A practical algorithm for faster matrix multiplication , 1999 .
[18] Steven G. Johnson,et al. The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.
[19] Michael A. Heroux,et al. GEMMW: A Portable Level 3 BLAS Winograd Variant of Strassen's Matrix-Matrix Multiply Algorithm , 1994, Journal of Computational Physics.
[20] Don Coppersmith,et al. Matrix multiplication via arithmetic progressions , 1987, STOC.
[21] Victor Y. Pan,et al. Strassen's algorithm is not optimal trilinear technique of aggregating, uniting and canceling for constructing fast algorithms for matrix operations , 1978, 19th Annual Symposium on Foundations of Computer Science (sfcs 1978).
[22] Bo Kågström,et al. Algorithm 784: GEMM-based level 3 BLAS: portability and optimization issues , 1998, TOMS.
[23] Bo Kågström,et al. GEMM-based level 3 BLAS: high-performance model implementations and performance evaluation benchmark , 1998, TOMS.
[24] James Demmel,et al. Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology , 1997, ICS '97.
[25] Robert A. van de Geijn,et al. FLAME: Formal Linear Algebra Methods Environment , 2001, TOMS.
[26] Jack J. Dongarra,et al. A set of level 3 basic linear algebra subprograms , 1990, TOMS.
[27] V. Pan. How can we speed up matrix multiplication , 1984 .
[28] Jean-Guillaume Dumas,et al. Dense Linear Algebra over Word-Size Prime Fields: the FFLAS and FFPACK Packages , 2006, TOMS.
[29] J. R. Johnson,et al. Implementation of Strassen's Algorithm for Matrix Multiplication , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.
[30] R. Brent. Error analysis of algorithms for matrix multiplication and triangular decomposition using Winograd's identity , 1970 .
[31] Nicholas J. Higham,et al. Accuracy and stability of numerical algorithms, Second Edition , 2002 .
[32] Igor E. Kaporin,et al. The aggregation and cancellation techniques as a practical tool for faster matrix multiplication , 2004, Theor. Comput. Sci..
[33] Douglas M. Priest,et al. Algorithms for arbitrary precision floating point arithmetic , 1991, [1991] Proceedings 10th IEEE Symposium on Computer Arithmetic.
[34] Alexandru Nicolau,et al. Adaptive Winograd's matrix multiplications , 2009, TOMS.
[35] Jack J. Dongarra,et al. Algorithm 679: A set of level 3 basic linear algebra subprograms: model implementation and test programs , 1990, TOMS.
[36] V. Strassen. Gaussian elimination is not optimal , 1969 .
[37] Alexandru Nicolau,et al. Techniques for efficient placement of synchronization primitives , 2009, PPoPP '09.
[38] Charles L. Lawson,et al. Basic Linear Algebra Subprograms for Fortran Usage , 1979, TOMS.
[39] Igor E. Kaporin,et al. A practical algorithm for faster matrix multiplication , 1999, Numerical Linear Algebra with Applications.
[40] Jack J. Dongarra,et al. Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.
[41] NicolauAlexandru,et al. Exploiting parallelism in matrix-computation kernels for symmetric multiprocessor systems , 2011 .
[42] R. Brent. Algorithms for matrix multiplication , 1970 .
[43] James Demmel,et al. Stability of block algorithms with fast level-3 BLAS , 1992, TOMS.
[44] S. Sitharama Iyengar,et al. Introduction to parallel algorithms , 1998, Wiley series on parallel and distributed computing.