Communication-optimal Parallel and Sequential QR and LU Factorizations
暂无分享,去创建一个
James Demmel | Laura Grigori | Julien Langou | Mark Hoemmen | J. Demmel | L. Grigori | M. Hoemmen | J. Langou
[1] H. Whitney,et al. An inequality related to the isoperimetric inequality , 1949 .
[2] Å. Björck. Solving linear least squares problems by Gram-Schmidt orthogonalization , 1967 .
[3] V. Strassen. Gaussian elimination is not optimal , 1969 .
[4] Lynn Elliot Cannon,et al. A cellular computer to implement the kalman filter algorithm , 1969 .
[5] N. Abdelmalek. Round off error analysis for Gram-Schmidt method and solution of linear least squares problems , 1971 .
[6] A. Kiełbasiński. Analiza numeryczna algorytmu ortogonalizacji Grama-Schmidta , 1974 .
[7] L. Csanky,et al. Fast parallel matrix inversion algorithms , 1975, 16th Annual Symposium on Foundations of Computer Science (sfcs 1975).
[8] David J. Kuck,et al. On Stable Parallel Linear System Solvers , 1978, JACM.
[9] B. Parlett. The Symmetric Eigenvalue Problem , 1981 .
[10] D. O’Leary. The block conjugate gradient algorithm and related methods , 1980 .
[11] H. T. Kung,et al. I/O complexity: The red-blue pebble game , 1981, STOC '81.
[12] Yves Robert,et al. Complexité de la factorisation QR en parallèle , 1982 .
[13] Don Coppersmith,et al. On the Asymptotic Complexity of Matrix Multiplication , 1982, SIAM J. Comput..
[14] Gene H. Golub,et al. Matrix computations , 1983 .
[15] J. J. Modi,et al. An alternative givens ordering , 1984 .
[16] M. Cosnard,et al. Parallel QR decomposition of a rectangular matrix , 1986 .
[17] Ed Anderson,et al. LAPACK Users' Guide , 1995 .
[18] Jack Dongarra,et al. ScaLAPACK Users' Guide , 1987 .
[19] G. Golub,et al. Parallel block schemes for large-scale least-squares computations , 1988 .
[20] Robert B. Wilhelmson. High-speed computing: scientific applications and algorithm design , 1988 .
[21] C. Loan,et al. A Storage-Efficient $WY$ Representation for Products of Householder Transformations , 1989 .
[22] B. Vital. Etude de quelques methodes de resolution de problemes lineaires de grande taille sur multiprocesseur , 1990 .
[23] J. Demmel. Trading Off Parallelism and Numerical Stability , 1992 .
[24] E. Ng,et al. Predicting structure in nonsymmetric sparse matrix factorizations , 1993 .
[25] H. Sagan. Space-filling curves , 1994 .
[26] Anthony Skjellum,et al. Using MPI - portable parallel programming with the message-parsing interface , 1994 .
[27] Jaeyoung Choi,et al. Design and Implementation of the ScaLAPACK LU, QR, and Cholesky Factorization Routines , 1994, Sci. Program..
[28] R. Freund,et al. A block QMR algorithm for non-Hermitian linear systems with multiple right-hand sides , 1997 .
[29] Jack J. Dongarra,et al. Key Concepts for Parallel Out-of-Core LU Factorization , 1996, Parallel Comput..
[30] Jack Dongarra,et al. The Design and Implementation of the Parallel Out-of-coreScaLAPACK LU, QR, and Cholesky Factorization Routines , 1997 .
[31] M. Rozložník,et al. Numerical behaviour of the modified gram-schmidt GMRES implementation , 1997 .
[32] Sivan Toledo. Locality of Reference in LU Decomposition with Partial Pivoting , 1997, SIAM J. Matrix Anal. Appl..
[33] Sivan Toledo,et al. A survey of out-of-core algorithms in numerical linear algebra , 1999, External Memory Algorithms.
[34] Erik Elmroth,et al. New Serial and Parallel Recursive QR Factorization Algorithms for SMP Systems , 1998, PARA.
[35] James Demmel,et al. LAPACK Users' Guide, Third Edition , 1999, Software, Environments and Tools.
[36] Mauro Leoncini,et al. Parallel Complexity of Numerically Accurate Linear System Solvers , 1999, SIAM J. Comput..
[37] Jack Dongarra,et al. Templates for the Solution of Algebraic Eigenvalue Problems , 2000, Software, environments, tools.
[38] Erik Elmroth,et al. Applying recursion to serial and parallel QR factorization leads to better performance , 2000, IBM J. Res. Dev..
[39] Jack J. Dongarra,et al. The design and implementation of the parallel out-of-core ScaLAPACK LU, QR, and Cholesky factorization routines , 2000, Concurr. Pract. Exp..
[40] Sivan Toledo,et al. Out-of-Core SVD and QR Decompositions , 2001, PPSC.
[41] Rudnei Dias da Cunha,et al. New Parallel (Rank-Revealing) QR Factorization Algorithms , 2002, Euro-Par.
[42] Kesheng Wu,et al. A Block Orthogonalization Procedure with Constant Synchronization Requirements , 2000, SIAM J. Sci. Comput..
[43] Ran Raz,et al. On the complexity of matrix product , 2002, STOC '02.
[44] Lothar Reichel,et al. Algorithm 827: irbleigs: A MATLAB program for computing a few eigenpairs of a large sparse Hermitian matrix , 2003, TOMS.
[45] Jeremy D. Frens,et al. QR factorization with Morton-ordered quadtree matrices for memory re-use and parallelism , 2003, PPoPP '03.
[46] Erik Elmroth,et al. SIAM REVIEW c ○ 2004 Society for Industrial and Applied Mathematics Vol. 46, No. 1, pp. 3–45 Recursive Blocked Algorithms and Hybrid Data Structures for Dense Matrix Library Software ∗ , 2022 .
[47] Marc Snir,et al. GETTING UP TO SPEED THE FUTURE OF SUPERCOMPUTING , 2004 .
[48] Dror Irony,et al. Communication lower bounds for distributed-memory matrix multiplication , 2004, J. Parallel Distributed Comput..
[49] Robert A. van de Geijn,et al. Parallel out-of-core computation and updating of the QR factorization , 2005, TOMS.
[50] Y. Danieli. Guide , 2005 .
[51] Julien Langou,et al. A note on the error analysis of classical Gram–Schmidt , 2006, Numerische Mathematik.
[52] Richard B. Lehoucq,et al. Basis selection in LOBPCG , 2006, J. Comput. Phys..
[53] Merico E. Argentati,et al. Block Locally Optimal Preconditioned Eigenvalue Xolvers (BLOPEX) in hypre and PETSc , 2007, SIAM J. Sci. Comput..
[54] James Demmel,et al. Fast linear algebra is stable , 2006, Numerische Mathematik.
[55] DongarraJack,et al. Parallel tiled QR factorization for multicore architectures , 2008 .
[56] Jack Dongarra,et al. QR Factorization for the CELL Processor , 2008 .
[57] Robert A. van de Geijn,et al. Design of scalable dense linear algebra libraries for multithreaded architectures: the LU factorization , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[58] James Demmel,et al. Communication Avoiding Gaussian elimination , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[59] James Demmel,et al. Communication-avoiding parallel and sequential QR factorizations , 2008, ArXiv.
[60] Robert A. van de Geijn,et al. Scheduling of QR Factorization Algorithms on SMP and Multi-Core Architectures , 2008, 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008).
[61] Jack Dongarra,et al. Some issues in dense linear algebra for multicore and special purpose architectures , 2008 .
[62] Julien Langou,et al. Parallel tiled QR factorization for multicore architectures , 2007, Concurr. Comput. Pract. Exp..
[63] J. Demmel,et al. Implementing Communication-Optimal Parallel and Sequential QR Factorizations , 2008, 0809.2407.
[64] George Almási,et al. Performance without pain = productivity: data layout and collective communication in UPC , 2008, PPoPP.
[65] James Demmel,et al. Nonnegative Diagonals and High Performance on Low-Profile Matrices from Householder QR , 2009, SIAM J. Sci. Comput..
[66] Julien Langou,et al. A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures , 2007, Parallel Comput..
[67] James Demmel,et al. Minimizing communication in sparse matrix solvers , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[68] S. Gratton,et al. Parallel Tools for Solving Incremental Dense Least Squares Problems: Application to Space Geodesy , 2009 .
[69] Mark Hoemmen,et al. Communication-avoiding Krylov subspace methods , 2010 .
[70] James Demmel,et al. CALU: A Communication Optimal LU Factorization Algorithm , 2011, SIAM J. Matrix Anal. Appl..
[71] James Demmel,et al. Minimizing Communication in Numerical Linear Algebra , 2009, SIAM J. Matrix Anal. Appl..