论文信息 - Tiled QR factorization algorithms

Tiled QR factorization algorithms

This work revisits existing algorithms for the QR factorization of rectangular matrices composed of p × q tiles, where p ≥ q. Within this framework, we study the critical paths and performance of algorithms such as SAMEH-KUCK, FI BONACCI, GREEDY, and those found within PLASMA. Al though neither FIBONACCI nor GREEDY is optimal, both are shown to be asymptotically optimal for all matrices of size p = q2 f(q), where f is any function such that lim+∞ f = 0. This novel and important complexity result applies to all matrices where p and q are proportional, p = λq, with λ ≥ 1, thereby encompassing many important situations in practice (least squares). We provide an extensive set of experiments that show the superiority of the new algorithms for tall matrices.

[1] Robert A. van de Geijn,et al. Programming matrix algorithms-by-blocks for thread-level parallelism , 2009, TOMS.

[2] J. J. Modi,et al. An alternative givens ordering , 1984 .

[3] Thomas Hérault,et al. QR factorization of tall and skinny matrices in a grid computing environment , 2009, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[4] Jack Dongarra,et al. Scheduling dense linear algebra operations on multicore processors , 2010 .

[5] Emmanuel Agullo,et al. Tile QR factorization with parallel panel processing for multicore architectures , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[6] Emmanuel Agullo,et al. Comparative study of one-sided factorizations with multiple software packages on multi-core hardware , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[7] Yves Robert,et al. Complexity of parallel QR factorization , 1986, JACM.

[8] Samuel Williams,et al. Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.

[9] Julien Langou,et al. Parallel tiled QR factorization for multicore architectures , 2007, Concurr. Comput. Pract. Exp..

[10] Julien Langou,et al. A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures , 2007, Parallel Comput..

[11] DongarraJack,et al. Parallel tiled QR factorization for multicore architectures , 2008 .

[12] David J. Kuck,et al. On Stable Parallel Linear System Solvers , 1978, JACM.

[13] Jack Dongarra,et al. Enhancing Parallelism of Tile QR Factorization for Multicore Architectures , 2010 .

[14] M. Cosnard,et al. Parallel QR decomposition of a rectangular matrix , 1986 .

[15] Emmanuel Agullo,et al. A Fully Empirical Autotuned Dense QR Factorization for Multicore Architectures , 2011, Euro-Par.

[16] James Demmel,et al. Communication-optimal Parallel and Sequential QR and LU Factorizations , 2008, SIAM J. Sci. Comput..

[17] James Demmel,et al. Minimizing communication in sparse matrix solvers , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[18] R. Clint Whaley,et al. Achieving accurate and context‐sensitive timing for code optimization , 2008, Softw. Pract. Exp..