SoftSIMD - Exploiting Subword Parallelism Using Source Code Transformations

SIMD instructions are used to speed up multimedia applications in high performance embedded computing. Vendors often use proprietary platforms which are incompatible with others. Therefore, porting software is a very complex and time consuming task. Moreover, lots of existing embedded processors do not have SIMD extensions at all. But they do provide a wide data path which is 32-bit or wider. Usually, multimedia applications work on short data types of 8 or 16-bit. Thus, only the lower bits of the data path are used and therefore only a fraction of the available computing power is exploited for such algorithms. This paper discusses the possibility to make use of the upper bits of the data path by emulating true SIMD instructions. These instructions are implemented purely in software using a high level language such as C. Therefore, the application can be modified by making use of source code transformations which are inherently portable. The benefit of this approach is that the computing resources are used more efficiently without compromising the portability of the code. Experiments have shown that a significant speedup can be obtained by this approach

[1]  Rainer Leupers Code selection for media processors with SIMD instructions , 2000, DATE '00.

[2]  Peter Kogge,et al.  Generation of permutations for SIMD processors , 2005, LCTES '05.

[3]  Michael F. P. O'Boyle,et al.  Combined Selection of Tile Sizes and Unroll Factors Using Iterative Compilation , 2000, Proceedings 2000 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00622).

[4]  Christopher W. Fraser,et al.  Engineering a simple, efficient code-generator generator , 1992, LOPL.

[5]  Henry G. Dietz,et al.  General-purpose simd within a register: parallel processing on consumer microprocessors , 2003 .

[6]  Saman P. Amarasinghe,et al.  Exploiting superword level parallelism with multimedia instruction sets , 2000, PLDI '00.

[7]  Henry S. Warren,et al.  Hacker's Delight , 2002 .

[8]  Peng Wu,et al.  Vectorization for SIMD architectures with alignment constraints , 2004, PLDI '04.

[9]  P. Koch,et al.  An evaluation of compiler-processor interaction for DSP applications , 2000, Conference Record of the Thirty-Fourth Asilomar Conference on Signals, Systems and Computers (Cat. No.00CH37154).