A 3D Parallel Algorithm for QR Decomposition
暂无分享,去创建一个
[1] Robert A. van de Geijn,et al. Collective communication: theory, practice, and experience: Research Articles , 2007 .
[2] Rajeev Thakur,et al. Optimization of Collective Communication Operations in MPICH , 2005, Int. J. High Perform. Comput. Appl..
[3] Torsten Hoefler,et al. Communication-Avoiding Parallel Algorithms for Solving Triangular Systems of Linear Equations , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[4] James Demmel,et al. Communication-optimal Parallel and Sequential QR and LU Factorizations , 2008, SIAM J. Sci. Comput..
[5] Leslie G. Valiant,et al. A bridging model for parallel computation , 1990, CACM.
[6] Alston S. Householder,et al. Unitary Triangularization of a Nonsymmetric Matrix , 1958, JACM.
[7] James Demmel,et al. Reconstructing Householder Vectors from Tall-Skinny QR , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.
[8] Oded Schwartz,et al. Hypergraph Partitioning for Sparse Matrix-Matrix Multiplication , 2016, TOPC.
[9] James Demmel,et al. A Communication-Avoiding Parallel Algorithm for the Symmetric Eigenvalue Problem , 2016, SPAA.
[10] Robert A. van de Geijn,et al. Collective communication: theory, practice, and experience , 2007, Concurr. Comput. Pract. Exp..
[11] James Demmel,et al. Trade-Offs Between Synchronization, Communication, and Computation in Parallel Linear Algebra Computations , 2016 .
[12] James Demmel,et al. Communication-Optimal Parallel 2.5D Matrix Multiplication and LU Factorization Algorithms , 2011, Euro-Par.
[13] Ramesh C. Agarwal,et al. A three-dimensional approach to parallel matrix multiplication , 1995, IBM J. Res. Dev..
[14] Martin D. Schatz,et al. Parallel Matrix Multiplication: A Systematic Journey , 2016, SIAM J. Sci. Comput..
[15] Erik Elmroth,et al. Applying recursion to serial and parallel QR factorization leads to better performance , 2000, IBM J. Res. Dev..
[16] Alexander Tiskin. Communication-efficient parallel generic pairwise elimination , 2007, Future Gener. Comput. Syst..
[17] David A. Bader,et al. Parallel algorithms for personalized communication and sorting with an experimental study (extended abstract) , 1996, SPAA '96.
[18] C. Puglisi. Modification of the householder method based on the compact WY representation , 1992 .
[19] David F. Gleich,et al. Tall and skinny QR factorizations in MapReduce architectures , 2011, MapReduce '11.
[20] James Demmel,et al. Communication lower bounds and optimal algorithms for numerical linear algebra*† , 2014, Acta Numerica.
[21] Christos H. Papadimitriou,et al. A Communication-Time Tradeoff , 1987, SIAM J. Comput..
[22] C. Loan,et al. A Storage-Efficient $WY$ Representation for Products of Householder Transformations , 1989 .
[23] Jack J. Dongarra,et al. Improving the Performance of CA-GMRES on Multicores with Multiple GPUs , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.
[24] Christian H. Bischof,et al. A Basis-Kernel Representation of Orthogonal Matrices , 1995, SIAM J. Matrix Anal. Appl..
[25] Erik Elmroth,et al. SIAM REVIEW c ○ 2004 Society for Industrial and Applied Mathematics Vol. 46, No. 1, pp. 3–45 Recursive Blocked Algorithms and Hybrid Data Structures for Dense Matrix Library Software ∗ , 2022 .
[26] Roland W. Freund,et al. Computing Fundamental Matrix Decompositions Accurately via the Matrix Sign Function in Two Iterations: The Power of Zolotarev's Functions , 2016, SIAM Rev..
[27] Eli Upfal,et al. Efficient Algorithms for All-to-All Communications in Multiport Message-Passing Systems , 1997, IEEE Trans. Parallel Distributed Syst..