A Novel Pipelined Algorithm and Modular Architecture for Non-Square Matrix Transposition

In this brief, we present a novel pipelined algorithm for transposing non-square matrices and describe the corresponding architecture for this algorithm. In particular, the architecture is composed of a series of identical cascaded basic circuits and can be controlled via a simple control strategy based on several counters. The architecture is optimal in terms of both memory and latency and it achieves the theoretical minimums. Moreover, the proposed algorithm and architecture could be easily extended to <inline-formula> <tex-math notation="LaTeX">$N$ </tex-math></inline-formula>-parallel implementations for matrix transposition. This architecture supports matrices whose rows and columns are integer multiples; it is mainly used for radix- <inline-formula> <tex-math notation="LaTeX">$2^{s} $ </tex-math></inline-formula> butterfly algorithms using matrix transpositions. Experimental results indicate that the proposed single-path architecture can reduce the computation cycles and circuit area by a factor of 9.18% and 5.87%, respectively, for a <inline-formula> <tex-math notation="LaTeX">$32\times 16$ </tex-math></inline-formula> matrix transposition computation, compared with those of a recently proposed state-of-the-art architecture for matrix transposition.

[1]  Anshul Kumar,et al.  High performance 3D-FFT implementation , 2013, 2013 IEEE International Symposium on Circuits and Systems (ISCAS2013).

[2]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Faisal Mahmood,et al.  2D Discrete Fourier Transform with simultaneous edge artifact removal for real-time applications , 2015, 2015 International Conference on Field Programmable Technology (FPT).

[4]  Mario Garrido,et al.  Continuous-Flow Matrix Transposition Using Memories , 2020, IEEE Transactions on Circuits and Systems I: Regular Papers.

[5]  Jesús Grajal,et al.  Optimum Circuits for Bit Reversal , 2011, IEEE Transactions on Circuits and Systems II: Express Briefs.

[6]  X. Y. Zhou,et al.  An accurate 2-D nonuniform fast fourier transform method applied to high resolution SAR image reconstruction , 2012, 2012 International Workshop on Metamaterials (Meta).

[7]  Jarmo Takala,et al.  Stride permutation networks for array processors , 2004, Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004..

[8]  Feng Yu,et al.  An Optimum Architecture for Continuous-Flow Parallel Bit Reversal , 2015, IEEE Signal Processing Letters.

[9]  Chuohao Yeo,et al.  Efficient Integer DCT Architectures for HEVC , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[10]  I. De Lotto,et al.  Large-matrix-ordering technique with applications to transposition , 1973 .

[11]  C. H. Paik,et al.  Fast Hartley transforms for image processing. , 1988, IEEE transactions on medical imaging.

[12]  S. Murugan,et al.  A DSP based real-time 3D FFT system for analysis of dynamic parameters , 2014, 2014 IEEE International Conference on Advanced Communications, Control and Computing Technologies.

[13]  M. Noskov,et al.  Modification of a two-dimensional fast Fourier transform algorithm by the analog of the Cooley-Tukey algorithm for a rectangular signal , 2015, Pattern Recognition and Image Analysis.

[14]  Feng Yu,et al.  Pipelined Algorithm and Modular Architecture for Matrix Transposition , 2019, IEEE Transactions on Circuits and Systems II: Express Briefs.