Minimizing Communication in Numerical Linear Algebra
暂无分享,去创建一个
James Demmel | Oded Schwartz | Grey Ballard | Olga Holtz | J. Demmel | Grey Ballard | O. Schwartz | Olga Holtz
[1] H. Whitney,et al. An inequality related to the isoperimetric inequality , 1949 .
[2] Keshav Pingali,et al. Automatic Generation of Block-Recursive Codes , 2000, Euro-Par.
[3] ToledoSivan,et al. Communication lower bounds for distributed-memory matrix multiplication , 2004 .
[4] Raphael Yuster,et al. Fast sparse matrix multiplication , 2004, TALG.
[5] Yousef Saad,et al. Iterative methods for sparse linear systems , 2003 .
[6] Robert A. van de Geijn,et al. FLAME: Formal Linear Algebra Methods Environment , 2001, TOMS.
[7] Inderjit S. Dhillon,et al. Orthogonal Eigenvectors and Relative Gaps , 2003, SIAM J. Matrix Anal. Appl..
[8] James Demmel,et al. Communication-optimal Parallel and Sequential QR and LU Factorizations , 2008, SIAM J. Sci. Comput..
[9] James Demmel,et al. Graph expansion and communication costs of fast matrix multiplication: regular submission , 2011, SPAA '11.
[10] Katherine Yelick,et al. OSKI: A library of automatically tuned sparse matrix kernels , 2005 .
[11] James Demmel,et al. Communication-Optimal Parallel 2.5D Matrix Multiplication and LU Factorization Algorithms , 2011, Euro-Par.
[12] W. Marsden. I and J , 2012 .
[13] Erik Elmroth,et al. Applying recursion to serial and parallel QR factorization leads to better performance , 2000, IBM J. Res. Dev..
[14] C. Loan,et al. A Storage-Efficient $WY$ Representation for Products of Householder Transformations , 1989 .
[15] Karen S. Braman,et al. The Multishift QR Algorithm. Part II: Aggressive Early Deflation , 2001, SIAM J. Matrix Anal. Appl..
[16] Erik Elmroth,et al. SIAM REVIEW c ○ 2004 Society for Industrial and Applied Mathematics Vol. 46, No. 1, pp. 3–45 Recursive Blocked Algorithms and Hybrid Data Structures for Dense Matrix Library Software ∗ , 2022 .
[17] James Demmel,et al. Communication avoiding Gaussian elimination , 2008, HiPC 2008.
[18] Matteo Frigo,et al. Cache-oblivious algorithms , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).
[19] Alok Aggarwal,et al. The input/output complexity of sorting and related problems , 1988, CACM.
[20] Fred G. Gustavson,et al. Recursion leads to automatic variable blocking for dense linear-algebra algorithms , 1997, IBM J. Res. Dev..
[21] James Demmel,et al. LU Factorization with Panel Rank Revealing Pivoting and Its Communication Avoiding Version , 2012, SIAM J. Matrix Anal. Appl..
[22] Christian H. Bischof,et al. The WY representation for products of householder matrices , 1985, PPSC.
[23] Erik Elmroth,et al. New Serial and Parallel Recursive QR Factorization Algorithms for SMP Systems , 1998, PARA.
[24] Inderjit S. Dhillon,et al. The design and implementation of the MRRR algorithm , 2006, TOMS.
[25] J. Bunch,et al. Some stable methods for calculating inertia and solving symmetric linear systems , 1977 .
[26] C. Puglisi. Modification of the householder method based on the compact WY representation , 1992 .
[27] Viktor K. Prasanna,et al. Optimizing graph algorithms for improved cache performance , 2004, Proceedings 16th International Parallel and Distributed Processing Symposium.
[28] Vijaya Ramachandran,et al. Cache-oblivious dynamic programming , 2006, SODA '06.
[29] R. K. Shyamasundar,et al. Introduction to algorithms , 1996 .
[30] Fred G. Gustavson,et al. A recursive formulation of Cholesky factorization of a matrix in packed storage , 2001, TOMS.
[31] Alexander Tiskin,et al. Memory-Efficient Matrix Multiplication in the BSP Model , 1999, Algorithmica.
[32] Christian H. Bischof,et al. A framework for symmetric band reduction , 2000, TOMS.
[33] H. T. Kung,et al. I/O complexity: The red-blue pebble game , 1981, STOC '81.
[34] Sivan Toledo. Locality of Reference in LU Decomposition with Partial Pivoting , 1997, SIAM J. Matrix Anal. Appl..
[35] Robert A. van de Geijn,et al. Parallel out-of-core computation and updating of the QR factorization , 2005, TOMS.
[36] Julien Langou,et al. A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures , 2007, Parallel Comput..
[37] Sartaj Sahni,et al. Parallel Matrix and Graph Algorithms , 1981, SIAM J. Comput..
[38] Viktor K. Prasanna,et al. Optimizing graph algorithms for improved cache performance , 2002, IEEE Transactions on Parallel and Distributed Systems.
[39] William Gropp,et al. Hybrid Static/dynamic Scheduling for Already Optimized Dense Matrix Factorization , 2011, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.
[40] Cleve Ashcraft,et al. The Fan-Both Family of Column-Based Distributed Cholesky Factorization Algorithms , 1993 .
[41] Ramesh C. Agarwal,et al. A three-dimensional approach to parallel matrix multiplication , 1995, IBM J. Res. Dev..
[42] Jack Dongarra,et al. ScaLAPACK Users' Guide , 1987 .
[43] D. Rose,et al. Complexity Bounds for Regular Finite Difference and Finite Element Grids , 1973 .
[44] Dror Irony,et al. TRADING REPLICATION FOR COMMUNICATION IN PARALLEL DISTRIBUTED-MEMORY DENSE SOLVERS , 2002 .
[45] Jack J. Dongarra,et al. Basic Linear Algebra Subprograms Technical (Blast) Forum Standard (1) , 2002, Int. J. High Perform. Comput. Appl..
[46] John E. Savage. Extending the Hong-Kung Model to Memory Hierarchies , 1995, COCOON.
[47] James Demmel,et al. Applied Numerical Linear Algebra , 1997 .
[48] G. Golub,et al. Parallel block schemes for large-scale least-squares computations , 1988 .
[49] Laura Grigori,et al. Adapting communication-avoiding LU and QR factorizations to multicore architectures , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[50] A. George. Nested Dissection of a Regular Finite Element Mesh , 1973 .
[51] Y. Saad,et al. Communication complexity of the Gaussian elimination algorithm on multiprocessors , 1986 .
[52] James Demmel,et al. Brief announcement: Lower bounds on communication for sparse Cholesky factorization of a model problem , 2010, SPAA '10.
[53] Christian H. Bischof,et al. Algorithm 807: The SBR Toolbox—software for successive band reduction , 2000, TOMS.
[54] Cleve Ashc Raft. The fan-both family of column-based distributed Cholesky factorization algorithms , 1993 .
[55] Gene H. Golub,et al. Matrix computations , 1983 .
[56] Karen S. Braman,et al. The Multishift QR Algorithm. Part I: Maintaining Well-Focused Shifts and Level 3 Performance , 2001, SIAM J. Matrix Anal. Appl..
[57] DemmelJames,et al. Graph expansion and communication costs of fast matrix multiplication , 2013 .
[58] Jack Dongarra,et al. Preface: Basic Linear Algebra Subprograms Technical (Blast) Forum Standard , 2002 .
[59] Michael A. Bender,et al. Optimal Sparse Matrix Dense Vector Multiplication in the I/O-Model , 2007, SPAA '07.
[60] R. Tarjan,et al. The analysis of a nested dissection algorithm , 1987 .
[61] Jeremy D. Frens,et al. QR factorization with Morton-ordered quadtree matrices for memory re-use and parallelism , 2003, PPoPP '03.
[62] James Demmel,et al. Minimizing Communication in Linear Algebra , 2009, ArXiv.
[63] Jack Dongarra,et al. Basic Linear Algebra Subprograms (BLAS) , 2011, Encyclopedia of Parallel Computing.
[64] James Demmel,et al. CALU: A Communication Optimal LU Factorization Algorithm , 2011, SIAM J. Matrix Anal. Appl..
[65] Dror Irony,et al. Trading Replication for Communication in Parallel Distributed-Memory Dense Solvers , 2002, Parallel Process. Lett..
[66] Jack Dongarra,et al. LAPACK's user's guide , 1992 .
[67] James Demmel,et al. Fast linear algebra is stable , 2006, Numerische Mathematik.
[68] J. Demmel,et al. Implementing Communication-Optimal Parallel and Sequential QR Factorizations , 2008, 0809.2407.
[69] James Demmel,et al. Graph Expansion and Communication Costs of Algorithms , 2010 .
[70] Robert A. van de Geijn,et al. PLAPACK: Parallel Linear Algebra Package , 1997, PPSC.
[71] Lynn Elliot Cannon,et al. A cellular computer to implement the kalman filter algorithm , 1969 .
[72] Thomas H. Cormen,et al. Introduction to algorithms [2nd ed.] , 2001 .
[73] James Demmel,et al. Communication-optimal Parallel and Sequential Cholesky Decomposition , 2009, SIAM J. Sci. Comput..