论文信息 - Generalizing Matrix Multiplication for Efficient Computations on Modern Computers - 字舞流文

Generalizing Matrix Multiplication for Efficient Computations on Modern Computers

Recent advances in computing allow taking new look at matrix multiplication, where the key ideas are: decreasing interest in recursion, development of processors with thousands (potentially millions) of processing units, and influences from the Algebraic Path Problems. In this context, we propose a generalized matrix-matrix multiply-add (MMA) operation and illustrate its usability. Furthermore, we elaborate the interrelation between this generalization and the BLAS standard.

Marcin Paprzycki | Stanislav G. Sedukhin | M. Paprzycki | S. Sedukhin

[1] B. David Saunders,et al. Transitive Closure and Related Semiring Properties via Eliminants , 1985, Theor. Comput. Sci..

[2] Jack J. Dongarra,et al. An extended set of FORTRAN basic linear algebra subprograms , 1988, TOMS.

[3] Richard M. Russell,et al. The CRAY-1 computer system , 1978, CACM.

[4] C Jiang. AN OPTIMAL ALGORITHM FOR MATRIX MULTIPLICATION , 1990 .

[5] Jack J. Dongarra,et al. A proposal for a set of level 3 basic linear algebra subprograms , 1987, SGNM.

[6] Anthony Skjellum,et al. A poly-algorithm for parallel dense matrix multiplication on two-dimensional process grid topologies , 1997, Concurr. Pract. Exp..

[7] David H. Bailey,et al. Using Strassen's algorithm to accelerate the solution of linear systems , 1991, The Journal of Supercomputing.

[8] Jack Dongarra,et al. Experiments with Strassen's Algorithm: From Sequential to Parallel , 2006 .

[9] Robert A. van de Geijn,et al. A High Performance Parallel Strassen Implementation , 1995, Parallel Process. Lett..

[10] Erdem Hokenek,et al. Design of the IBM RISC System/6000 Floating-Point Execution Unit , 1990, IBM J. Res. Dev..

[11] Thomas Rauber,et al. Combining building blocks for parallel multi-level matrix multiplication , 2008, Parallel Comput..

[12] Daniel J. Lehmann,et al. Algebraic Structures for Transitive Closure , 1976, Theor. Comput. Sci..

[13] Toshiaki Miyazaki,et al. Orbital Systolic Algorithms and Array Processors for Solution of the Algebraic Path Problem , 2010, IEICE Trans. Inf. Syst..

[14] José E. Moreira,et al. The fused multiply-add instruction leads to algorithms for extended-precision floating point: applications to java and high-performance computing , 1999, CASCON.

[15] Garrett Birkhoff,et al. A survey of modern algebra , 1942 .

[16] Jack Dongarra,et al. MPI: The Complete Reference , 1996 .

[17] Charles L. Lawson,et al. Basic Linear Algebra Subprograms for Fortran Usage , 1979, TOMS.

[18] Toshiaki Miyazaki,et al. Orbital Algorithms and Unified Array Processor for Computing 2D Separable Transforms , 2010, 2010 39th International Conference on Parallel Processing Workshops.

[19] Stanislav G. Sedukhin,et al. A Solution of the All-Pairs Shortest Paths Problem on the Cell Broadband Engine Processor , 2009, IEICE Trans. Inf. Syst..

[20] George L.-T. Chiu,et al. Overview of the Blue Gene/L system architecture , 2005, IBM J. Res. Dev..

[21] Toshiaki Miyazaki,et al. Rapid*Closure: Algebraic Extensions of a Scalar Multiply-add Operation , 2010, CATA.