High-Throughput FPGA Implementation of QR Decomposition

This brief presents a hardware design to achieve high-throughput QR decomposition, using the Givens rotation method. It utilizes a new 2-D systolic array architecture with pipelined processing elements, which are based on the COordinate Rotation DIgital Computer (CORDIC) algorithm. CORDIC computes vector rotations through shifts and additions. This approach allows a continuous computation of QR factorizations with simple hardware. A fixed-point field-programmable gate array (FPGA) architecture for 4 × 4 matrices has been optimized by balancing the number of CORDIC iterations with the final error. As a result, compared with other previous proposals for FPGA, our design achieves at least 50% more throughput, as well as much less resource utilization.

[1]  K. Dharmarajan,et al.  Parallel VLSI algorithm for stable inversion of dense matrices , 1989 .

[2]  J. Sheeba Rani,et al.  FPGA Based Scalable Fixed Point QRD Core Using Dynamic Partial Reconfiguration , 2015, 2015 28th International Conference on VLSI Design.

[3]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[4]  Steffen Paul,et al.  Efficient FPGA implementation of a High throughput systolic array QR-decomposition algorithm , 2011, 2011 Conference Record of the Forty Fifth Asilomar Conference on Signals, Systems and Computers (ASILOMAR).

[5]  Milos D. Ercegovac,et al.  Digital Arithmetic , 2003, Wiley Encyclopedia of Computer Science and Engineering.

[6]  Chih-Hung Lin,et al.  Iterative QR Decomposition Architecture Using the Modified Gram-Schmidt Algorithm for MIMO Systems , 2010, IEEE Trans. Circuits Syst. I Regul. Pap..

[7]  C. C. Jong,et al.  Scalable linear array architectures for matrix inversion using Bi-z CORDIC , 2012, Microelectron. J..

[8]  Akkarat Boonpoonga,et al.  FPGA-based hardware/software implementation for MIMO wireless communications , 2014, 2014 International Electrical Engineering Congress (iEECON).

[9]  Shing-Chow Chan,et al.  Improved approximate QR-LS algorithms for adaptive filtering , 2004, IEEE Transactions on Circuits and Systems II: Express Briefs.

[10]  Dongdong Chen,et al.  Fixed-Point CORDIC-Based QR Decomposition by Givens Rotations on FPGA , 2011, 2011 International Conference on Reconfigurable Computing and FPGAs.

[11]  J. Saniie,et al.  FPGA implementation of fast QR decomposition based on givens rotation , 2012, 2012 IEEE 55th International Midwest Symposium on Circuits and Systems (MWSCAS).

[12]  Uwe Schwiegelshohn,et al.  A Square Root and Division Free Givens Rotation for Solving Least Squares Problems on Systolic Arrays , 1991, SIAM J. Sci. Comput..

[13]  Chih-Hung Lin,et al.  Iterative $QR$ Decomposition Architecture Using the Modified Gram–Schmidt Algorithm for MIMO Systems , 2009, IEEE Transactions on Circuits and Systems I: Regular Papers.

[14]  Peng Wang,et al.  High performance real-time Pre-Processing for Fixed-Complexity Sphere Decoder , 2013, 2013 IEEE Global Conference on Signal and Information Processing.

[15]  Jan M. Rabaey,et al.  A Scalable Configurable Architecture for Advanced Wireless Communication Algorithms , 2006, J. VLSI Signal Process..

[16]  Pei-Yun Tsai,et al.  Efficient Implementation of QR Decomposition for Gigabit MIMO-OFDM Systems , 2011, IEEE Transactions on Circuits and Systems I: Regular Papers.