Comparing fast implementations of bit permutation instructions

Recently, a number of candidate instructions have been proposed to efficiently compute arbitrary bit permutations. Among these, GRP is the most attractive, having utility for other applications in addition to permutation such as sorting and having good inherent cryptographic properties. However, the current implementation of GRP is the slowest of the candidates; BFLY, on the other hand, is the fastest. In this paper, we examine the possibility of executing GRP on a butterfly or an inverse butterfly network.

[1]  Ruby B. Lee,et al.  Bit permutation instructions for accelerating software cryptography , 2000, Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors.

[2]  Ruby B. Lee,et al.  Fast subword permutation instructions based on butterfly network , 1999, Electronic Imaging.

[3]  F. Leighton,et al.  Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes , 1991 .

[4]  Ruby B. Lee,et al.  Bit permutation instructions: architecture, implementation, and cryptographic properties , 2004 .

[5]  Ruby B. Lee,et al.  Subword sorting with versatile permutation instructions , 2002, Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors.

[6]  Ruby B. Lee,et al.  Architectural techniques for accelerating subword permutations with repetitions , 2003, IEEE Trans. Very Large Scale Integr. Syst..

[7]  Ruby B. Lee,et al.  Arbitrary bit permutations in one or two cycles , 2003, Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003.

[8]  Ronald L. Rivest,et al.  On permutation operations in cipher design , 2004, International Conference on Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004..

[9]  Xiao Yang,et al.  How a processor can permute n bits in O(1) cycles , 2002 .

[10]  Ruby B. Lee,et al.  Implementation complexity of bit permutation instructions , 2003, The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003.

[11]  Ruby B. Lee,et al.  Efficient permutation instructions for fast software cryptography , 2001 .

[12]  Ruby B. Lee,et al.  Fast subword permutation instructions using omega and flip network stages , 2000, Proceedings 2000 International Conference on Computer Design.