SIAM REVIEW c ○ 2004 Society for Industrial and Applied Mathematics Vol. 46, No. 1, pp. 3–45 Recursive Blocked Algorithms and Hybrid Data Structures for Dense Matrix Library Software ∗
暂无分享,去创建一个
Erik Elmroth | Isak Jonsson | Bo Kågström | Fred G. Gustavson | F. Gustavson | B. Kågström | E. Elmroth | Isak Jonsson
[1] Isak Jonsson,et al. Parallel Triangular Sylvester-Type Matrix Equation Solvers for SMP Systems Using Recursive Blocking , 2000, PARA.
[2] F. G. Gustavson,et al. High-performance linear algebra algorithms using new generalized data structures for matrices , 2003, IBM J. Res. Dev..
[3] V. Strassen. Gaussian elimination is not optimal , 1969 .
[4] Steve Carr,et al. Compiler blockability of dense matrix factorizations , 1997, TOMS.
[5] Dror Irony,et al. Parallel and Fully Recursive Multifrontal Supernodal Sparse Cholesky , 2002, International Conference on Computational Science.
[6] Keshav Pingali,et al. Automatic Generation of Block-Recursive Codes , 2000, Euro-Par.
[7] Para,et al. Applied Parallel Computing Large Scale Scientific and Industrial Problems , 1998, Lecture Notes in Computer Science.
[8] Erik Elmroth,et al. High-Performance Library Software for QR Factorization , 2000, PARA.
[9] Isak Jonsson,et al. Recursive blocked algorithms for solving triangular systems—Part II: two-sided and generalized Sylvester and Lyapunov matrix equations , 2002, TOMS.
[10] Sivan Toledo,et al. Very Large Electronic Structure Calculations Using an Out-of-Core Filter-Diagonalization Method , 2002 .
[11] G. Forsythe,et al. Computer solution of linear algebraic systems , 1969 .
[12] Charles L. Lawson,et al. Basic Linear Algebra Subprograms for Fortran Usage , 1979, TOMS.
[13] James Demmel,et al. Statistical Models for Automatic Performance Tuning , 2001, International Conference on Computational Science.
[14] Robert A. van de Geijn,et al. Formal Linear Algebra Methods Environment (FLAME) Overview , 2000 .
[15] Ken Kennedy,et al. Transforming loops to recursion for multi-level memory hierarchies , 2000, PLDI '00.
[16] Antony Jameson,et al. Solution of the Equation $AX + XB = C$ by Inversion of an $M \times M$ or $N \times N$ Matrix , 1968 .
[17] Jack J. Dongarra,et al. A proposal for a set of level 3 basic linear algebra subprograms , 1987, SGNM.
[18] Jack Dongarra,et al. LAPACK Users' Guide, 3rd ed. , 1999 .
[19] Bo Kågström,et al. Distributed and Shared Memory Block Algorithms for the Triangular Sylvester Equation with øperatornamesep - 1 Estimators , 1992, SIAM J. Matrix Anal. Appl..
[20] G. Golub,et al. A Hessenberg-Schur method for the problem AX + XB= C , 1979 .
[21] Isak Jonsson,et al. Minimal-storage high-performance Cholesky factorization via blocking and recursion , 2000, IBM J. Res. Dev..
[22] B. Kågström,et al. Generalized Schur methods with condition estimators for solving the generalized Sylvester equation , 1989 .
[23] Y. Wang,et al. The Influence of the Compiler on the Cost of Mathematical Software—in Particular on the Cost of Triangular Factorization , 1975, TOMS.
[24] Michael E. Wolf,et al. The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.
[25] Christian H. Bischof,et al. The WY representation for products of householder matrices , 1985, PPSC.
[26] G. Forsythe,et al. Computer solution of linear algebraic systems , 1969 .
[27] Yuefan Deng,et al. New trends in high performance computing , 2001, Parallel Computing.
[28] Victor Eijkhout,et al. Recursive approach in sparse matrix LU factorization , 2001, Sci. Program..
[29] Piet Hut,et al. A hierarchical O(N log N) force-calculation algorithm , 1986, Nature.
[30] James Demmel,et al. Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology , 1997, ICS '97.
[31] Krister Dackland,et al. Blocked algorithms and software for reduction of a regular matrix pair to generalized Schur form , 1999, TOMS.
[32] Fred G. Gustavson,et al. A recursive formulation of Cholesky factorization of a matrix in packed storage , 2001, TOMS.
[33] Isak Jonsson,et al. High-Performance Matrix Multiplication on the IBM SP High Node , 1998 .
[34] Nicholas J. Higham,et al. Perturbation theory and backward error forAX−XB=C , 1993 .
[35] S. Hammarling. Numerical Solution of the Stable, Non-negative Definite Lyapunov Equation , 1982 .
[36] Isak Jonsson,et al. RECSY - A High Performance Library for Sylvester-Type Matrix Equations , 2003, Euro-Par.
[37] Mei Han An,et al. accuracy and stability of numerical algorithms , 1991 .
[38] BO K Agstr,et al. A GENERALIZED STATE-SPACE APPROACH FOR THE ADDITIVE DECOMPOSITION OF A TRANSFER MATRIX , 1992 .
[39] Mithuna Thottethodi,et al. Recursive Array Layouts and Fast Matrix Multiplication , 2002, IEEE Trans. Parallel Distributed Syst..
[40] Sivan Toledo,et al. Out-of-Core SVD and QR Decompositions , 2001, PPSC.
[41] James Demmel,et al. On computing condition numbers for the nonsymmetric eigenproblem , 1993, TOMS.
[42] C. Loan,et al. A Storage-Efficient $WY$ Representation for Products of Householder Transformations , 1989 .
[43] Steven G. Johnson,et al. FFTW: an adaptive software architecture for the FFT , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[44] Erik Elmroth,et al. Applying recursion to serial and parallel QR factorization leads to better performance , 2000, IBM J. Res. Dev..
[45] Ramesh C. Agarwal,et al. Exploiting functional parallelism of POWER2 to design high-performance numerical algorithms , 1994, IBM J. Res. Dev..
[46] Charles E. Leiserson,et al. Cache-Oblivious Algorithms , 2003, CIAC.
[47] Hanan Samet,et al. The Quadtree and Related Hierarchical Data Structures , 1984, CSUR.
[48] Isak Jonsson,et al. Parallel Two-Sided Sylvester-Type Matrix Equation Solvers for SMP Systems Using Recursive Blocking , 2002, PARA.
[49] Fred G. Gustavson,et al. New Generalized Data Structures for Matrices Lead to a Variety of High Performance Algorithms , 2001, PPAM.
[50] Bo Kågström,et al. Algorithm 784: GEMM-based level 3 BLAS: portability and optimization issues , 1998, TOMS.
[51] Jeremy D. Frens,et al. QR factorization with Morton-ordered quadtree matrices for memory re-use and parallelism , 2003, PPoPP '03.
[52] J. Uhlig. C. Forsythe and C. B. Moler, Computer Solution of Linear Algebraic Systems. (Series in Automatic Computation) XI + 148 S. Englewood Cliffs, N.J. 1967. Prentice-Hall, Inc. Preis geb. 54 s. net , 1972 .
[53] James Demmel,et al. Stability of block algorithms with fast level-3 BLAS , 1992, TOMS.
[54] B. S. Garbow,et al. Matrix Eigensystem Routines — EISPACK Guide , 1974, Lecture Notes in Computer Science.
[55] Isak Jonsson,et al. Superscalar GEMM-based Level 3 BLAS - The On-going Evolution of a Portable and High-Performance Library , 1998, PARA.
[56] Robert A. van de Geijn,et al. FLAME: Formal Linear Algebra Methods Environment , 2001, TOMS.
[57] Gene H. Golub,et al. Matrix computations (3rd ed.) , 1996 .
[58] Nicholas J. Higham,et al. FORTRAN codes for estimating the one-norm of a real or complex matrix, with applications to condition estimation , 1988, TOMS.
[59] E. M. Wright. Solution of the equation $ze^z = a$ , 1959 .
[60] Bo Kågström,et al. GEMM-based level 3 BLAS: high-performance model implementations and performance evaluation benchmark , 1998, TOMS.
[61] V. Klema. LINPACK user's guide , 1980 .
[62] Bo Kågström,et al. Computing eigenspaces with specified eigenvalues of a regular matrix pair (A, B) and condition estimation: theory, algorithms and software , 1996, Numerical Algorithms.
[63] Sivan Toledo. Locality of Reference in LU Decomposition with Partial Pivoting , 1997 .
[64] Alan J. Laub,et al. Solution of the Sylvester matrix equation AXBT + CXDT = E , 1992, TOMS.
[65] Bo Kågström,et al. LAPACK-style algorithms and software for solving the generalized Sylvester equation and estimating the separation between regular matrix pairs , 1994, TOMS.
[66] Anthony Skjellum,et al. A framework for high‐performance matrix multiplication based on hierarchical abstractions, algorithms and optimized low‐level kernels , 2002, Concurr. Comput. Pract. Exp..
[67] Isak Jonsson,et al. Recursive Blocked Data Formats and BLAS's for Dense Linear Algebra Algorithms , 1998, PARA.
[68] Viktor K. Prasanna,et al. Tiling, Block Data Layout, and Memory Hierarchy Performance , 2003, IEEE Trans. Parallel Distributed Syst..
[69] Erik Elmroth,et al. New Serial and Parallel Recursive QR Factorization Algorithms for SMP Systems , 1998, PARA.
[70] Nicholas J. Higham,et al. Perturbation Theory And Backward Error For , 1993 .
[71] K. Chu. The solution of the matrix equations AXB−CXD=E AND (YA−DZ,YC−BZ)=(E,F) , 1987 .
[72] Alston S. Householder,et al. Handbook for Automatic Computation , 1960, Comput. J..
[73] Jack J. Dongarra,et al. Software Libraries for Linear Algebra Computations on High Performance Computers , 1995, SIAM Rev..
[74] Michael S. Warren,et al. Skeletons from the treecode closet , 1994 .
[75] Gene H. Golub,et al. Matrix computations , 1983 .
[76] Karen S. Braman,et al. The Multishift QR Algorithm. Part I: Maintaining Well-Focused Shifts and Level 3 Performance , 2001, SIAM J. Matrix Anal. Appl..
[77] F. Gustavson,et al. A Faster and Simpler Recursive Algorithm for the LAPACK Routine DGELS , 2001 .
[78] G. W. Stewart,et al. Matrix algorithms , 1998 .
[79] Isak Jonsson,et al. Recursive blocked algorithms for solving triangular systems—Part I: one-sided and coupled Sylvester-type matrix equations , 2002, TOMS.
[80] Karen S. Braman,et al. The Multishift QR Algorithm. Part II: Aggressive Early Deflation , 2001, SIAM J. Matrix Anal. Appl..
[81] J. Cuppen. A divide and conquer method for the symmetric tridiagonal eigenproblem , 1980 .
[82] G. W. Stewart,et al. Matrix Algorithms: Volume 1, Basic Decompositions , 1998 .
[83] Matteo Frigo. A Fast Fourier Transform Compiler , 1999, PLDI.
[84] Bo Kågström,et al. A Perturbation Analysis of the Generalized Sylvester Equation $(AR - LB, DR - LE) = (C, F)$ , 1994 .
[85] Michael A. Heroux,et al. GEMMW: A Portable Level 3 BLAS Winograd Variant of Strassen's Matrix-Matrix Multiply Algorithm , 1994, Journal of Computational Physics.
[86] Jack J. Dongarra,et al. Automated empirical optimizations of software and the ATLAS project , 2001, Parallel Comput..
[87] Jack J. Dongarra,et al. Algorithm 679: A set of level 3 basic linear algebra subprograms: model implementation and test programs , 1990, TOMS.
[88] Michael S. Warren,et al. Fast Parallel Tree Codes for Gravitational and Fluid Dynamical N-Body Problems , 1994, Int. J. High Perform. Comput. Appl..
[89] Jeremy D. Frens,et al. Language support for Morton-order matrices , 2001, PPoPP '01.
[90] J. H. Wilkinson,et al. Handbook for Automatic Computation. Vol II, Linear Algebra , 1973 .
[91] Peter Poromaa. Parallel Algorithms for Triangular Sylvester Equations: Design, Scheduling and Saclability Issues , 1998, PARA.
[92] Thilo Penzl,et al. Numerical solution of generalized Lyapunov equations , 1998, Adv. Comput. Math..
[93] H. Sagan. Space-filling curves , 1994 .
[94] G. Stewart,et al. Matrix Perturbation Theory , 1990 .
[95] Monica S. Lam,et al. A data locality optimizing algorithm , 1991, PLDI '91.
[96] Isak Jonsson. Analysis of Processor and Memory Utilization of Recursive Algorithms for Sylvester-Type Matrix Equations using Performance Monitoring , 2003 .
[97] Fred G. Gustavson,et al. Recursion leads to automatic variable blocking for dense linear-algebra algorithms , 1997, IBM J. Res. Dev..
[98] K. A. Gallivan,et al. Parallel Algorithms for Dense Linear Algebra Computations , 1990, SIAM Rev..
[99] Fred G. Gustavson. New Generalized Data Structures for Matrices Lead to a Variety of High Performance Algorithms , 2001 .