暂无分享,去创建一个
James Demmel | Oded Schwartz | Grey Ballard | Olga Holtz | J. Demmel | Grey Ballard | O. Schwartz | Olga Holtz
[1] Robert A. van de Geijn,et al. FLAME: Formal Linear Algebra Methods Environment , 2001, TOMS.
[2] Robert A. van de Geijn,et al. PLAPACK: Parallel Linear Algebra Package , 1997, PPSC.
[3] James Demmel,et al. Fast linear algebra is stable , 2006, Numerische Mathematik.
[4] H. Whitney,et al. An inequality related to the isoperimetric inequality , 1949 .
[5] Yousef Saad,et al. Iterative methods for sparse linear systems , 2003 .
[6] Alok Aggarwal,et al. The input/output complexity of sorting and related problems , 1988, CACM.
[7] Christian H. Bischof,et al. The WY representation for products of householder matrices , 1985, PPSC.
[8] Lynn Elliot Cannon,et al. A cellular computer to implement the kalman filter algorithm , 1969 .
[9] Jack Dongarra,et al. LAPACK's user's guide , 1992 .
[10] H. T. Kung,et al. I/O complexity: The red-blue pebble game , 1981, STOC '81.
[11] C. Puglisi. Modification of the householder method based on the compact WY representation , 1992 .
[12] Raphael Yuster,et al. Fast sparse matrix multiplication , 2004, TALG.
[13] Karen S. Braman,et al. The Multishift QR Algorithm. Part II: Aggressive Early Deflation , 2001, SIAM J. Matrix Anal. Appl..
[14] G. Golub,et al. Parallel block schemes for large-scale least-squares computations , 1988 .
[15] James Demmel,et al. Communication-optimal Parallel and Sequential Cholesky Decomposition , 2009, SIAM J. Sci. Comput..
[16] Keshav Pingali,et al. Automatic Generation of Block-Recursive Codes , 2000, Euro-Par.
[17] Karen S. Braman,et al. The Multishift QR Algorithm. Part I: Maintaining Well-Focused Shifts and Level 3 Performance , 2001, SIAM J. Matrix Anal. Appl..
[18] Katherine Yelick,et al. OSKI: A library of automatically tuned sparse matrix kernels , 2005 .
[19] Inderjit S. Dhillon,et al. Orthogonal Eigenvectors and Relative Gaps , 2003, SIAM J. Matrix Anal. Appl..
[20] Fred G. Gustavson,et al. A recursive formulation of Cholesky factorization of a matrix in packed storage , 2001, TOMS.
[21] C. Loan,et al. A Storage-Efficient $WY$ Representation for Products of Householder Transformations , 1989 .
[22] Erik Elmroth,et al. SIAM REVIEW c ○ 2004 Society for Industrial and Applied Mathematics Vol. 46, No. 1, pp. 3–45 Recursive Blocked Algorithms and Hybrid Data Structures for Dense Matrix Library Software ∗ , 2022 .
[23] Julien Langou,et al. A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures , 2007, Parallel Comput..
[24] Jack J. Dongarra,et al. Basic Linear Algebra Subprograms Technical (Blast) Forum Standard (1) , 2002, Int. J. High Perform. Comput. Appl..
[25] John E. Savage. Extending the Hong-Kung Model to Memory Hierarchies , 1995, COCOON.
[26] Dror Irony,et al. Communication lower bounds for distributed-memory matrix multiplication , 2004, J. Parallel Distributed Comput..
[27] Michael A. Bender,et al. Optimal Sparse Matrix Dense Vector Multiplication in the I/O-Model , 2007, SPAA '07.
[28] James Demmel,et al. Communication Avoiding Gaussian elimination , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[29] R. Tarjan,et al. The analysis of a nested dissection algorithm , 1987 .
[30] Erik Elmroth,et al. New Serial and Parallel Recursive QR Factorization Algorithms for SMP Systems , 1998, PARA.
[31] James Demmel,et al. Applied Numerical Linear Algebra , 1997 .
[32] A. George. Nested Dissection of a Regular Finite Element Mesh , 1973 .
[33] Erik Elmroth,et al. Applying recursion to serial and parallel QR factorization leads to better performance , 2000, IBM J. Res. Dev..
[34] Charles E. Leiserson,et al. Cache-Oblivious Algorithms , 2003, CIAC.
[35] Viktor K. Prasanna,et al. Optimizing graph algorithms for improved cache performance , 2002, IEEE Transactions on Parallel and Distributed Systems.
[36] Vijaya Ramachandran,et al. Cache-oblivious dynamic programming , 2006, SODA '06.
[37] Robert H. Halstead,et al. Matrix Computations , 2011, Encyclopedia of Parallel Computing.
[38] Inderjit S. Dhillon,et al. The design and implementation of the MRRR algorithm , 2006, TOMS.
[39] Sivan Toledo. Locality of Reference in LU Decomposition with Partial Pivoting , 1997, SIAM J. Matrix Anal. Appl..
[40] Robert A. van de Geijn,et al. Parallel out-of-core computation and updating of the QR factorization , 2005, TOMS.
[41] D. Rose,et al. Complexity Bounds for Regular Finite Difference and Finite Element Grids , 1973 .