Applying Out-of-Core QR Decomposition Algorithms on FPGA-Based Systems

QR decomposition, especially through the means of Householder transformation, is often used to solve least squares problems. A matrix to be decomposed with this method is usually very large, often large enough that it is not able to fit into the main memory of a workstation, let alone the internal memory of an FPGA nowadays. Efficient out-of-core algorithms have been developed to address the factorization of large matrices. This paper describes the application of variants of Householder QR decomposition on FPGA-based systems. More specifically, issues on applying out-of-core algorithms to the relatively small internal memory architecture of FPGA's are investigated.

[1]  Kleanthis Psarris,et al.  An FPGA-based computation model for blocked algorithms , 2006 .

[2]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[3]  Robert A. van de Geijn,et al.  Parallel out-of-core computation and updating of the QR factorization , 2005, TOMS.

[4]  Wayne Luk,et al.  Pipeline vectorization for reconfigurable systems , 1999, Seventh Annual IEEE Symposium on Field-Programmable Custom Computing Machines (Cat. No.PR00375).

[5]  Ken Kennedy,et al.  Automatic blocking of QR and LU factorizations for locality , 2004, MSP '04.

[6]  Jack J. Dongarra,et al.  Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[7]  Kleanthis Psarris,et al.  Super fast hardware string matching , 2006, 2006 IEEE International Conference on Field Programmable Technology.

[8]  Erik Elmroth,et al.  SIAM REVIEW c ○ 2004 Society for Industrial and Applied Mathematics Vol. 46, No. 1, pp. 3–45 Recursive Blocked Algorithms and Hybrid Data Structures for Dense Matrix Library Software ∗ , 2022 .

[9]  Jack Dongarra,et al.  LAPACK Users' Guide, 3rd ed. , 1999 .

[10]  Charles L. Lawson,et al.  Solving least squares problems , 1976, Classics in applied mathematics.