Single-Cycle Bit Permutations with MOMR Execution

Secure computing paradigms impose new architectural challenges for general-purpose processors. Cryptographic processing is needed for secure communications, storage, and computations. We identify two categories of operations in symmetric-key and public-key cryptographic algorithms that are not common in previous general-purpose workloads: advanced bit operations within a word and multi-word operations. We define MOMR (Multiple Operands Multiple Results) execution or datarich execution as a unified solution to both challenges. It allows arbitrary n-bit permutations to be achieved in one or two cycles, rather than O(n) cycles as in existing RISC processors. It also enables significant acceleration of multi-word multiplications needed by public-key ciphers. We propose two implementations of MOMR: one employs only hardware changes while the other uses Instruction Set Architecture (ISA) support. We show that MOMR execution leverages available resources in typical multi-issue processors with minimal additional cost. Multi-issue processors enhanced with MOMR units provide additional speedup over standard multi-issue processors with the same datapath. MOMR is a general architectural solution for word-oriented processor architectures to incorporate datarich operations.

[1]  Ruby B. Lee,et al.  Efficient permutation instructions for fast software cryptography , 2001 .

[2]  Todd M. Austin,et al.  Architectural support for fast symmetric-key cryptography , 2000, SIGP.

[3]  Bradley C. Kuszmaul,et al.  Circuits for wide-window superscalar processors , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[4]  Ruby B. Lee,et al.  Bit permutation instructions for accelerating software cryptography , 2000, Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors.

[5]  Todd M. Austin,et al.  CryptoManiac: a fast flexible architecture for secure communication , 2001, Proceedings 28th Annual International Symposium on Computer Architecture.

[6]  Ruby B. Lee Subword parallelism with MAX-2 , 1996, IEEE Micro.

[7]  James E. Smith,et al.  Complexity-Effective Superscalar Processors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[8]  Xiao Yang,et al.  How a processor can permute n bits in O(1) cycles , 2002 .

[9]  Ruby B. Lee,et al.  Architectural enhancements for fast subword permutations with repetitions in cryptographic applications , 2001, Proceedings 2001 IEEE International Conference on Computer Design: VLSI in Computers and Processors. ICCD 2001.

[10]  Ramon Canal,et al.  A low-complexity issue logic , 2000, ICS '00.

[11]  Yale N. Patt,et al.  Select-free instruction scheduling logic , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.

[12]  Hunter Scales,et al.  AltiVec Extension to PowerPC Accelerates Media Processing , 2000, IEEE Micro.

[13]  Fischer Issue Logic For A 600 MHz Out-of-order Execution , 1997, Symposium 1997 on VLSI Circuits.

[14]  S. Önder,et al.  Superscalar Execution with Direct Data Forwarding , 1998, PACT 1998.

[15]  Ruby B. Lee,et al.  Fast subword permutation instructions using omega and flip network stages , 2000, Proceedings 2000 International Conference on Computer Design.

[16]  Ruby B. Lee,et al.  Evaluating instruction set extensions for fast arithmetic on binary finite fields , 2004, Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004..

[17]  Ruby B. Lee,et al.  Validating Word-Oriented Processors for Bit and Multi-word Operations , 2004, Asia-Pacific Computer Systems Architecture Conference.

[18]  Yale N. Patt,et al.  On pipelining dynamic instruction scheduling logic , 2000, MICRO 33.

[19]  Ruby B. Lee,et al.  Fast subword permutation instructions based on butterfly network , 1999, Electronic Imaging.

[20]  Ruby B. Lee,et al.  Evaluating instruction set extensions for fast arithmetic on binary finite fields , 2004 .