Fast subword permutation instructions based on butterfly network
暂无分享,去创建一个
Many contemporary microprocessor architectures incorporate multimedia extensions to accelerate media-rich applications using subword arithmetic. While these extensions significantly improve the performance of most multimedia applications, the lack of subword rearrangement support potentially limits performance gain. Several means of adding architectural support for subword rearrangement were proposed and implemented but none of them provide a fully general solution. In this paper, a new class of permutation instructions based on the butterfly interconnection network is proposed to address the general subword rearrangement problem. It can be used to perform arbitrary permutation (without repetition) of n subwords within log n cycles regardless of the subword size. The instruction coding and the low-level implementation for the instructions are quite simple. An algorithm is also given to derive an instruction sequence for any arbitrary permutation.
[1] Ruby B. Lee. Accelerating multimedia with enhanced microprocessors , 1995, IEEE Micro.
[2] Ruby B. Lee. Subword parallelism with MAX-2 , 1996, IEEE Micro.
[3] F. Leighton,et al. Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes , 1991 .
[4] Uri C. Weiser,et al. MMX technology extension to the Intel architecture , 1996, IEEE Micro.
[5] Marc Tremblay,et al. VIS speeds new media processing , 1996, IEEE Micro.