Performing Advanced Bit Manipulations Efficiently in General-Purpose Processors

This paper describes a new basis for the implementation of a shifter functional unit. We present a design based on the inverse butterfly and butterfly datapath circuits that performs the standard shift and rotate operations, as well as more advanced extract, deposit and mix operations found in some processors. Additionally, it also supports important new classes of even more advanced bit manipulation instructions recently proposed: these include arbitrary bit permutations, bit scatter and bit gather instructions. The new functional unit's datapath is comparable in latency to that of the classic barrel shifter. It replaces two existing functional units-shifter and mix-with a much more powerful one.

[1]  Ruby B. Lee Subword parallelism with MAX-2 , 1996, IEEE Micro.

[2]  Ruby B. Lee,et al.  Arbitrary bit permutations in one or two cycles , 2003, Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003.

[3]  F. Leighton,et al.  Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes , 1991 .

[4]  V. Benes Optimal rearrangeable multistage connecting networks , 1964 .

[5]  Sajal K. Das,et al.  Book Review: Introduction to Parallel Algorithms and Architectures : Arrays, Trees, Hypercubes by F. T. Leighton (Morgan Kauffman Pub, 1992) , 1992, SIGA.

[6]  Ruby B. Lee,et al.  Subword sorting with versatile permutation instructions , 2002, Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors.

[7]  Ruby B. Lee Precision architecture , 1989, Computer.

[8]  Ruby B. Lee,et al.  Fast Bit Compression and Expansion with Parallel Extract and Parallel Deposit Instructions , 2006, IEEE 17th International Conference on Application-specific Systems, Architectures and Processors (ASAP'06).

[9]  18th IEEE Symposium on Computer Arithmetic (ARITH-18 2007), 25-27 June 2007, Montpellier, France , 2007, IEEE Symposium on Computer Arithmetic.

[10]  Xiao Yang,et al.  How a processor can permute n bits in O(1) cycles , 2002 .