Scaling Up Parallel Computation of Tiled QR Factorizations by a Distributed Scheduling Runtime System and Analytical Modeling
暂无分享,去创建一个
Zizhong Chen | Lan Lin | Fengguang Song | Weijian Zheng | Fengguang Song | Zizhong Chen | Weijian Zheng | Lan Lin
[1] Marc Casas,et al. Iteration-fusing conjugate gradient , 2017, ICS.
[2] James Demmel,et al. Reconstructing Householder Vectors from Tall-Skinny QR , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.
[3] J. Navarro-Pedreño. Numerical Methods for Least Squares Problems , 1996 .
[4] Emmanuel Agullo,et al. Task-Based Conjugate Gradient: From Multi-GPU Towards Heterogeneous Architectures , 2016, Euro-Par Workshops.
[5] James Demmel,et al. Minimizing communication in sparse matrix solvers , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[6] James Demmel,et al. Communication-optimal Parallel and Sequential QR and LU Factorizations , 2008, SIAM J. Sci. Comput..
[7] Ed Anderson,et al. LAPACK Users' Guide , 1995 .
[8] Jack J. Dongarra,et al. A scalable approach to solving dense linear algebra problems on hybrid CPU‐GPU systems , 2015, Concurr. Comput. Pract. Exp..
[9] James Demmel,et al. Applied Numerical Linear Algebra , 1997 .
[10] Jack J. Dongarra,et al. Enabling and scaling matrix computations on heterogeneous multi-core and multi-GPU systems , 2012, ICS '12.
[11] W. Morven Gentleman,et al. Row elimination for solving sparse linear systems and least squares problems , 1976 .
[12] James Demmel,et al. LU Factorization with Panel Rank Revealing Pivoting and Its Communication Avoiding Version , 2012, SIAM J. Matrix Anal. Appl..
[13] Jack J. Dongarra,et al. Scalable Tile Communication-Avoiding QR Factorization on Multicore Cluster Systems , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[14] Jack J. Dongarra,et al. Dynamic task scheduling for linear algebra algorithms on distributed-memory multicore systems , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[15] J. Dongarra,et al. Generalized QR factorization and its applications , 1992 .
[16] Mark Hoemmen,et al. A Communication-Avoiding, Hybrid-Parallel, Rank-Revealing Orthogonalization Method , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.
[17] Padma Raghavan,et al. Distributed Orthogonal Factorization , 1989 .
[18] James Demmel,et al. ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers - Design Issues and Performance , 1995, PARA.