Communication efficient gaussian elimination with partial pivoting using a shape morphing data layout
暂无分享,去创建一个
James Demmel | Sivan Toledo | Oded Schwartz | Grey Ballard | Benjamin Lipshitz | J. Demmel | Sivan Toledo | Grey Ballard | O. Schwartz | B. Lipshitz | Benjamin Lipshitz
[1] Sivan Toledo. Locality of Reference in LU Decomposition with Partial Pivoting , 1997, SIAM J. Matrix Anal. Appl..
[2] Keshav Pingali,et al. Automatic Generation of Block-Recursive Codes , 2000, Euro-Par.
[3] Jeremy D. Frens,et al. QR factorization with Morton-ordered quadtree matrices for memory re-use and parallelism , 2003, PPoPP '03.
[4] James Demmel,et al. Communication-Optimal Parallel 2.5D Matrix Multiplication and LU Factorization Algorithms , 2011, Euro-Par.
[5] Dror Irony,et al. Communication lower bounds for distributed-memory matrix multiplication , 2004, J. Parallel Distributed Comput..
[6] H. T. Kung,et al. I/O complexity: The red-blue pebble game , 1981, STOC '81.
[7] Jack Dongarra,et al. LAPACK's user's guide , 1992 .
[8] Fred G. Gustavson,et al. Recursion leads to automatic variable blocking for dense linear-algebra algorithms , 1997, IBM J. Res. Dev..
[9] Jeremy D. Frens,et al. Auto-blocking matrix-multiplication or tracking BLAS3 performance from source code , 1997, PPOPP '97.
[10] Matteo Frigo,et al. An analysis of dag-consistent distributed shared-memory algorithms , 1996, SPAA '96.
[11] James Demmel,et al. CALU: A Communication Optimal LU Factorization Algorithm , 2011, SIAM J. Matrix Anal. Appl..
[12] Isak Jonsson,et al. Recursive Blocked Data Formats and BLAS's for Dense Linear Algebra Algorithms , 1998, PARA.
[13] James Demmel,et al. LU Factorization with Panel Rank Revealing Pivoting and Its Communication Avoiding Version , 2012, SIAM J. Matrix Anal. Appl..
[14] James Demmel,et al. Fast linear algebra is stable , 2006, Numerische Mathematik.
[15] James Demmel,et al. Minimizing Communication in Numerical Linear Algebra , 2009, SIAM J. Matrix Anal. Appl..
[16] Erik Elmroth,et al. New Serial and Parallel Recursive QR Factorization Algorithms for SMP Systems , 1998, PARA.
[17] Erik Elmroth,et al. SIAM REVIEW c ○ 2004 Society for Industrial and Applied Mathematics Vol. 46, No. 1, pp. 3–45 Recursive Blocked Algorithms and Hybrid Data Structures for Dense Matrix Library Software ∗ , 2022 .
[18] James Demmel,et al. Communication-optimal Parallel and Sequential Cholesky Decomposition , 2009, SIAM J. Sci. Comput..
[19] J. W. Demmel. LAPACK Working Note 53: Trading Off Parallelism and Numerical Stability , 1992 .
[20] Oded Schwartz,et al. Communication-optimal parallel and sequential Cholesky decomposition: extended abstract , 2009, SPAA.
[21] Charles E. Leiserson,et al. Cache-Oblivious Algorithms , 2003, CIAC.
[22] Matemática,et al. Society for Industrial and Applied Mathematics , 2010 .