Efficient and scalable CGRA-based implementation of Column-wise Givens Rotation

Givens Rotation is a key computation-intensive block in embedded wireless applications. In order to achieve an efficient mapping which smoothly scales to the underlying architecture, we propose two new Column-based Givens Rotation algorithms, derived from traditional Fast Givens and Square-root and Division Free Givens algorithms. These algorithms allow annihilation of multiple elements in a column of the input matrix simultaneously, without a dependency bottle-neck allowing increased parallelism, resource sharing and scalability. The ease of mapping and scalability has been tested on a layered coarse-grained reconfigurable architecture reaching close to optimal results for highly parallel architectures.

[1]  S. K. Nandy,et al.  Efficient QR Decomposition Using Low Complexity Column-wise Givens Rotation (CGR) , 2014, 2014 27th International Conference on VLSI Design and 2014 13th International Conference on Embedded Systems.

[2]  Z. E. Rakossy,et al.  Design and analysis of layered coarse-grained reconfigurable architecture , 2012, 2012 International Conference on Reconfigurable Computing and FPGAs.

[3]  Emmanuel Agullo,et al.  QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[4]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[5]  Uwe Schwiegelshohn,et al.  A Square Root and Division Free Givens Rotation for Solving Least Squares Problems on Systolic Arrays , 1991, SIAM J. Sci. Comput..

[6]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[7]  S. K. Nandy,et al.  REDEFINE: Runtime reconfigurable polymorphic ASIC , 2009, TECS.