FPGA implementation of fast QR decomposition based on givens rotation

In this paper, an improved fixed-point hardware design of QR decomposition, specifically optimized for Xilinx FPGAs is introduced. A Givens Rotation algorithm is implemented by using a folded systolic array and the CORDIC algorithm, making this very suitable for high-speed FPGAs or ASIC designs. We improve the internal cell structure so that the system can run at 246MHz with nearly 24M updates per second throughout on a Virtex5 FPGA. The matrix size can be easily scaled up.