Fixed-Point CORDIC-Based QR Decomposition by Givens Rotations on FPGA

This paper presents a parallel architecture of an QR decomposition systolic array based on the Givens rotations algorithm on FPGA. The proposed architecture adopts a direct mapping by 21 fixed-point CORDIC-based process units that can compute the QR decomposition for an 4×4 real matrix. In order to achieve a comprehensive resource and performance evaluation, the computational error analysis, the resource utilized, and speed achieved on Virtex5 XC5VTX150T FPGA, are evaluated with the different precision of the intermediate word lengthes. The evaluation results show that 1) the proposed systolic array satisfies 99.9% correct 4×4 QR decomposition for the 2-13 accuracy requirement when the word length of the data path is lager than 25-bit, 2) occupies about 2, 810 (13%) slices, and achieves about 2.06 M/sec updates by running at the maximum frequency 111 MHz.

[1]  K. Sridharan,et al.  50 Years of CORDIC: Algorithms, Architectures, and Applications , 2009, IEEE Transactions on Circuits and Systems I: Regular Papers.

[2]  C. Rader,et al.  VLSI systolic arrays for adaptive nulling , 1996 .

[3]  Saeid Nooshabadi,et al.  Parametric minimum hardware QR-factoriser architecture for V-BLAST detection , 2006 .

[4]  Oleg Maslennikov,et al.  Implementation of Givens QR-Decomposition in FPGA , 2001, PPAM.

[5]  Uwe Schwiegelshohn,et al.  A Square Root and Division Free Givens Rotation for Solving Least Squares Problems on Systolic Arrays , 1991, SIAM J. Sci. Comput..

[6]  Jesse Kempa,et al.  FPGA based embedded processing architecture for the QRD-RLS algorithm , 2004, 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[7]  K. J. Ray Liu,et al.  A unified square-root-free approach for QRD-based recursive-least-squares estimation , 1993, IEEE Trans. Signal Process..

[8]  Alston S. Householder,et al.  Unitary Triangularization of a Nonsymmetric Matrix , 1958, JACM.

[9]  Robert W. Jackson,et al.  RECONFIGURABLE ANTENNA PROCESSING WITH MATRIX DECOMPOSITION USING FPGA BASED APPLICATION SPECIFIC INTEGRATED PROCESSORS , 2004 .

[10]  W. E. Gentleman Least Squares Computations by Givens Transformations Without Square Roots , 1973 .

[11]  R.W.M. Smith,et al.  Architectures for adaptive weight calculation on ASIC and FPGA , 1999, Conference Record of the Thirty-Third Asilomar Conference on Signals, Systems, and Computers (Cat. No.CH37020).

[12]  J. Rice Experiments on Gram-Schmidt orthogonalization , 1966 .

[13]  H. T. Kung,et al.  Matrix Triangularization By Systolic Arrays , 1982, Optics & Photonics.

[14]  Michael McGuire,et al.  Embedded Reconfigurable Solution for OFDM Detection Over Fast Fading Radio Channels , 2007, 2007 IEEE Workshop on Signal Processing Systems.

[15]  S. Kung,et al.  VLSI Array processors , 1985, IEEE ASSP Magazine.

[16]  C. M. Rader,et al.  VLSI systolic arrays for adaptive nulling [radar] , 1996, IEEE Signal Process. Mag..

[17]  Jack E. Volder The CORDIC Trigonometric Computing Technique , 1959, IRE Trans. Electron. Comput..

[18]  Hiroyuki Arai,et al.  Resource and Performance Evaluations of Fixed Point QRD-RLS Systolic Array through FPGA Implementation , 2008, IEICE Trans. Commun..

[19]  W. Givens Computation of Plain Unitary Rotations Transforming a General Matrix to Triangular Form , 1958 .