A New Basis for Shifters in General-Purpose Processors for Existing and Advanced Bit Manipulations

This paper describes a new basis for the implementation of the shifter functional unit in microprocessors that can implement new advanced bit manipulations as well as standard shifter operations. Our design is based on the inverse butterfly and butterfly data path circuits, rather than the barrel shifter or log-shifter designs currently used. We show how this new shifter can implement the standard shift and rotate operations, as well as more advanced extract, deposit, and mix operations found in some processors. Furthermore, it can perform important new classes of even more advanced bit manipulation instructions like arbitrary bit permutations, bit gather (or parallel extract), and bit scatter (or parallel deposit) instructions. Thus, our new functional unit performs the functionality of three functional units-the basic shifter, the multimedia-mix unit, and the advanced bit manipulation functional unit, while having a latency only slightly longer than that of the log-shifter. For performing only the existing functions of a shifter, it has significantly smaller area.

[1]  Ruby B. Lee,et al.  Multimedia instructions in ia-64 , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[2]  Y. Hilewitz,et al.  Comparing fast implementations of bit permutation instructions , 2004, Conference Record of the Thirty-Eighth Asilomar Conference on Signals, Systems and Computers, 2004..

[3]  Ruby B. Lee,et al.  Architectural enhancements for fast subword permutations with repetitions in cryptographic applications , 2001, Proceedings 2001 IEEE International Conference on Computer Design: VLSI in Computers and Processors. ICCD 2001.

[4]  F. Leighton,et al.  Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes , 1991 .

[5]  Ruby B. Lee,et al.  Subword sorting with versatile permutation instructions , 2002, Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors.

[6]  Ruby B. Lee Subword parallelism with MAX-2 , 1996, IEEE Micro.

[7]  Ruby B. Lee,et al.  Performing Advanced Bit Manipulations Efficiently in General-Purpose Processors , 2007, 18th IEEE Symposium on Computer Arithmetic (ARITH '07).

[8]  李佩露,et al.  Single-Cycle Bit Permutations with MOMR Execution , 2005 .

[9]  Ruby B. Lee Precision architecture , 1989, Computer.

[10]  Ruby B. Lee,et al.  Fast Bit Compression and Expansion with Parallel Extract and Parallel Deposit Instructions , 2006, IEEE 17th International Conference on Application-specific Systems, Architectures and Processors (ASAP'06).

[11]  Ruby B. Lee,et al.  64-bit and multimedia extensions in the PA-RISC 2.0 architecture , 1996, COMPCON '96. Technologies for the Information Superhighway Digest of Papers.

[12]  Ruby B. Lee,et al.  Fast subword permutation instructions using omega and flip network stages , 2000, Proceedings 2000 International Conference on Computer Design.

[13]  Ruby B. Lee,et al.  Bit permutation instructions for accelerating software cryptography , 2000, Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors.

[14]  Ruby B. Lee,et al.  Fast Bit Gather, Bit Scatter and Bit Permutation Instructions for Commodity Microprocessors , 2008, J. Signal Process. Syst..

[15]  Xiao Yang,et al.  How a processor can permute n bits in O(1) cycles , 2002 .

[16]  Ruby B. Lee,et al.  Efficient permutation instructions for fast software cryptography , 2001 .

[17]  Ruby B. Lee,et al.  Arbitrary bit permutations in one or two cycles , 2003, Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003.

[18]  V. Benes Optimal rearrangeable multistage connecting networks , 1964 .