Optimal Circuits for Streamed Linear Permutations Using RAM

We propose a method to automatically derive hardware structures that perform a fixed linear permutation on streaming data. Linear permutations are permutations that map linearly the bit representation of the elements addresses. This set contains many of the most important permutations in media processing, communication, and other applications and includes perfect shuffles, stride permutations, and the bit reversal. Streaming means that the data to be permuted arrive as a sequence of chunks over several cycles. We solve this problem by mathematically decomposing a given permutation into a sequence of three permutations that are either temporal or spatial. The former are implemented as banks of RAM, the latter as switching networks. We prove optimality of our solution in terms of the number of switches in these networks.

[1]  Keshab K. Parhi,et al.  Systematic synthesis of DSP data format converters using life-time analysis and forward-backward register allocation , 1992 .

[2]  Viktor K. Prasanna,et al.  Automatic generation of high throughput energy efficient streaming architectures for arbitrary fixed permutations , 2015, 2015 25th International Conference on Field Programmable Logic and Applications (FPL).

[3]  Abraham Waksman,et al.  A Permutation Network , 1968, JACM.

[4]  James C. Hoe,et al.  Automatic generation of streaming datapaths for arbitrary fixed permutations , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[5]  Markus Puschel,et al.  Generalizing block LU factorization: A lower–upper–lower block triangular decomposition with minimal off-diagonal ranks , 2016 .

[6]  David Steinberg Invariant Properties of the Shuffle-Exchange and a Simplified Cost-Effective Version of the Omega Network , 1983, IEEE Transactions on Computers.

[7]  Mohammad Reza Darafsheh,et al.  The maximum element order in the groups related to the linear groups which is a multiple of the defining characteristic , 2008, Finite Fields Their Appl..

[8]  Marshall C. Pease,et al.  The Indirect Binary n-Cube Microprocessor Array , 1977, IEEE Transactions on Computers.

[9]  Viktor K. Prasanna,et al.  Energy and Memory Efficient Mapping of Bitonic Sorting on FPGA , 2015, FPGA.

[10]  V. Benes,et al.  Mathematical Theory of Connecting Networks and Telephone Traffic. , 1966 .

[11]  Jacques Lenfant,et al.  Permuting data with the Omega network , 1985, Acta Informatica.

[12]  Sartaj Sahni,et al.  A Self-Routing Benes Network and Parallel Permutation Algorithms , 1981, IEEE Transactions on Computers.

[13]  David Thomas,et al.  The Art in Computer Programming , 2001 .

[14]  G. Steidl,et al.  A polynomial approach to fast algorithms for discrete Fourier-cosine and Fourier-sine transforms , 1991 .

[15]  Donald Ervin Knuth,et al.  The Art of Computer Programming, 2nd Ed. (Addison-Wesley Series in Computer Science and Information , 1978 .

[16]  Markus Püschel,et al.  Streaming Sorting Networks , 2016, TODE.

[17]  James C. Hoe,et al.  Permuting streaming data using RAMs , 2009, JACM.

[18]  Alan H. Karp Bit Reversal on Uniprocessors , 1996, SIAM Rev..

[19]  Franz Franchetti,et al.  Computer Generation of Hardware for Linear Digital Signal Processing Transforms , 2012, TODE.

[20]  Jarmo Takala,et al.  Stride permutation networks for array processors , 2004, Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004..