Fast subword permutation instructions based on butterfly network

Many contemporary microprocessor architectures incorporate multimedia extensions to accelerate media-rich applications using subword arithmetic. While these extensions significantly improve the performance of most multimedia applications, the lack of subword rearrangement support potentially limits performance gain. Several means of adding architectural support for subword rearrangement were proposed and implemented but none of them provide a fully general solution. In this paper, a new class of permutation instructions based on the butterfly interconnection network is proposed to address the general subword rearrangement problem. It can be used to perform arbitrary permutation (without repetition) of n subwords within log n cycles regardless of the subword size. The instruction coding and the low-level implementation for the instructions are quite simple. An algorithm is also given to derive an instruction sequence for any arbitrary permutation.