Generalizing Matrix Multiplication for Efficient Computations on Modern Computers
暂无分享,去创建一个
[1] B. David Saunders,et al. Transitive Closure and Related Semiring Properties via Eliminants , 1985, Theor. Comput. Sci..
[2] Jack J. Dongarra,et al. An extended set of FORTRAN basic linear algebra subprograms , 1988, TOMS.
[3] Richard M. Russell,et al. The CRAY-1 computer system , 1978, CACM.
[4] C Jiang. AN OPTIMAL ALGORITHM FOR MATRIX MULTIPLICATION , 1990 .
[5] Jack J. Dongarra,et al. A proposal for a set of level 3 basic linear algebra subprograms , 1987, SGNM.
[6] Anthony Skjellum,et al. A poly-algorithm for parallel dense matrix multiplication on two-dimensional process grid topologies , 1997, Concurr. Pract. Exp..
[7] David H. Bailey,et al. Using Strassen's algorithm to accelerate the solution of linear systems , 1991, The Journal of Supercomputing.
[8] Jack Dongarra,et al. Experiments with Strassen's Algorithm: From Sequential to Parallel , 2006 .
[9] Robert A. van de Geijn,et al. A High Performance Parallel Strassen Implementation , 1995, Parallel Process. Lett..
[10] Erdem Hokenek,et al. Design of the IBM RISC System/6000 Floating-Point Execution Unit , 1990, IBM J. Res. Dev..
[11] Thomas Rauber,et al. Combining building blocks for parallel multi-level matrix multiplication , 2008, Parallel Comput..
[12] Daniel J. Lehmann,et al. Algebraic Structures for Transitive Closure , 1976, Theor. Comput. Sci..
[13] Toshiaki Miyazaki,et al. Orbital Systolic Algorithms and Array Processors for Solution of the Algebraic Path Problem , 2010, IEICE Trans. Inf. Syst..
[14] José E. Moreira,et al. The fused multiply-add instruction leads to algorithms for extended-precision floating point: applications to java and high-performance computing , 1999, CASCON.
[15] Garrett Birkhoff,et al. A survey of modern algebra , 1942 .
[16] Jack Dongarra,et al. MPI: The Complete Reference , 1996 .
[17] Charles L. Lawson,et al. Basic Linear Algebra Subprograms for Fortran Usage , 1979, TOMS.
[18] Toshiaki Miyazaki,et al. Orbital Algorithms and Unified Array Processor for Computing 2D Separable Transforms , 2010, 2010 39th International Conference on Parallel Processing Workshops.
[19] Stanislav G. Sedukhin,et al. A Solution of the All-Pairs Shortest Paths Problem on the Cell Broadband Engine Processor , 2009, IEICE Trans. Inf. Syst..
[20] George L.-T. Chiu,et al. Overview of the Blue Gene/L system architecture , 2005, IBM J. Res. Dev..
[21] Toshiaki Miyazaki,et al. Rapid*Closure: Algebraic Extensions of a Scalar Multiply-add Operation , 2010, CATA.