Arbitrary bit permutations in one or two cycles

Symmetric-key block ciphers encrypt data, providing data confidentiality over the public Internet. For interoperability reasons, it is desirable to support a variety of symmetric-key ciphers efficiently. We show the basic operations performed by a variety of symmetric-key cryptography algorithms. Of these basic operations, only bit permutation is very slow using existing processors, followed by integer multiplication. New instructions have been proposed recently to accelerate bit permutations in general-purpose processors, reducing the instructions needed to achieve an arbitrary n-bit permutation from O(n) to O(log(n)). However, the serial data-dependency between these log(n) permutation instructions prevents them from being executed in fewer than log(n) cycles, even on superscalar processors. Since application specific instruction processors (ASIPs) have fewer constraints on maintaining standard processor datapath and control conventions, can we achieve even faster permutations? We propose six alternative ASIP approaches to achieve arbitrary 64 bit permutations in one or two cycles, using new BFLY and IBFLY instructions. This reduction to one or two cycles is achieved without increasing the cycle time. We compare the latencies of different permutation units in a technology independent way to estimate cycle time impact. We also compare the alternative ASIP architectures and their efficiency in performing arbitrary 64 bit permutations.

[1]  Shai Halevi,et al.  MARS - a candidate cipher for AES , 1999 .

[2]  Ruby B. Lee,et al.  Subword sorting with versatile permutation instructions , 2002, Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors.

[3]  Ruby B. Lee,et al.  Efficient permutation instructions for fast software cryptography , 2001 .

[4]  Nobu Matsumoto,et al.  Design methodology and system for a configurable media embedded processor extensible to VLIW architecture , 2002, Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors.

[5]  Ruby B. Lee,et al.  Bit permutation instructions for accelerating software cryptography , 2000, Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors.

[6]  Ruby B. Lee,et al.  Fast subword permutation instructions based on butterfly network , 1999, Electronic Imaging.

[7]  Ruby B. Lee,et al.  Fast subword permutation instructions using omega and flip network stages , 2000, Proceedings 2000 International Conference on Computer Design.

[8]  Rakesh Krishnaiyer,et al.  An Advanced Optimizer for the IA-64 Architecture , 2000, IEEE Micro.

[9]  Claude E. Shannon,et al.  Communication theory of secrecy systems , 1949, Bell Syst. Tech. J..

[10]  Ricardo E. Gonzalez,et al.  Xtensa: A Configurable and Extensible Processor , 2000, IEEE Micro.

[11]  Ruby B. Lee,et al.  Architectural enhancements for fast subword permutations with repetitions in cryptographic applications , 2001, Proceedings 2001 IEEE International Conference on Computer Design: VLSI in Computers and Processors. ICCD 2001.