Validating Word-Oriented Processors for Bit and Multi-word Operations

We examine secure computing paradigms to identify any new architectural challenges for future general-purpose processors. Some essential security functions can be provided by different classes of cryptography algorithms. We identify two categories of operations in these algorithms that are not common in previous general-purpose workloads: bit operations within a word and multi-word operations. Both challenge the basic word orientation of processors. We show how very complex bit-level operations, namely arbitrary bit permutations within a word, can be achieved in O(1) cycles, rather than O(n) cycles as in existing RISC processors. We describe two solutions: one using only microarchitecture changes, and another with Instruction Set Architecture (ISA) support. We generalize our solutions to define datarich execution with MOMR (Multi-word Operands Multi-word Result) functional units. This can address both challenges, leveraging available resources in typical processors with minimal additional cost. Thus we validate the basic word-orientation of processor architectures, since they can also provide superior performance for both bit and multi-word operations needed by cryptographic processing.

[1]  Fischer Issue Logic For A 600 MHz Out-of-order Execution , 1997, Symposium 1997 on VLSI Circuits.

[2]  Ruby B. Lee,et al.  Architectural enhancements for fast subword permutations with repetitions in cryptographic applications , 2001, Proceedings 2001 IEEE International Conference on Computer Design: VLSI in Computers and Processors. ICCD 2001.

[3]  Yale N. Patt,et al.  On pipelining dynamic instruction scheduling logic , 2000, Proceedings 33rd Annual IEEE/ACM International Symposium on Microarchitecture. MICRO-33 2000.

[4]  염흥렬,et al.  [서평]「Applied Cryptography」 , 1997 .

[5]  T. Austin,et al.  Architectural support for fast symmetric-key cryptography , 2000, ASPLOS IX.

[6]  James E. Smith,et al.  Complexity-Effective Superscalar Processors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[7]  S. Önder,et al.  Superscalar Execution with Direct Data Forwarding , 1998, PACT 1998.

[8]  Hunter Scales,et al.  AltiVec Extension to PowerPC Accelerates Media Processing , 2000, IEEE Micro.

[9]  Ramon Canal,et al.  A low-complexity issue logic , 2000, ICS '00.

[10]  Bradley C. Kuszmaul,et al.  Circuits for wide-window superscalar processors , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[11]  Ruby B. Lee,et al.  Fast subword permutation instructions using omega and flip network stages , 2000, Proceedings 2000 International Conference on Computer Design.

[12]  Yale N. Patt,et al.  Select-free instruction scheduling logic , 2001, MICRO.

[13]  Chris Weaver,et al.  CryptoManiac: a fast flexible architecture for secure communication , 2001, ISCA 2001.

[14]  Xiao Yang,et al.  How a processor can permute n bits in O(1) cycles , 2002 .

[15]  Ruby B. Lee,et al.  Efficient permutation instructions for fast software cryptography , 2001 .

[16]  Ruby B. Lee,et al.  Fast subword permutation instructions based on butterfly network , 1999, Electronic Imaging.

[17]  Ruby B. Lee,et al.  Bit permutation instructions for accelerating software cryptography , 2000, Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors.

[18]  Ruby B. Lee Subword parallelism with MAX-2 , 1996, IEEE Micro.

[19]  D. B. Davis,et al.  Intel Corp. , 1993 .