论文信息 - Overflow Controlled SIMD Arithmetic

Overflow Controlled SIMD Arithmetic

Although the ”SIMD within a register” parallel architectures have existed for almost 10 years, the automatic optimizations for such architectures are not well developed yet. Since most optimizations for SIMD architectures are transplanted from traditional vectorization techniques, many special features of SIMD architectures, such as packed operations, have not been thoroughly considered. As operands are tightly packed within a register, there is no spare space to indicate overflow. To maintain the accuracy of automatic SIMDized programs, the operands should be unpacked to preserve enough space for interim overflow. By doing this, great overhead would be introduced. Furthermore, the instructions for handling interim overflows can sometimes prevent other optimizations. In this paper, a new technique, OCSA (overflow controlled SIMD arithmetic), is proposed to reduce the negative effects caused by interim overflow handling and eliminate the interference of interim overflows. We have applied our algorithm to the multimedia benchmarks of Berkeley. The experimental results show that the OCSA algorithm can significantly improve the performance of ADPCM-Decoder (110%), MESA-Reflect (113%) and DJVU-Encoder (106%).

Binyu Zang | HongJiang Zhang | Hui Shi | Jiahua Zhu | Chuanqi Zhu

[1] Aart J. C. Bik,et al. Automatic Detection of Saturation and Clipping Idioms , 2002, LCPC.

[2] Mark Stephenson,et al. Bidwidth analysis with application to silicon compilation , 2000, PLDI '00.

[3] Aart J. C. Bik,et al. Automatic Intra-Register Vectorization for the Intel® Architecture , 2002, International Journal of Parallel Programming.

[4] Andreas Krall,et al. Compilation Techniques for Multimedia Processors , 2004, International Journal of Parallel Programming.

[5] Pradeep K. Dubey,et al. How Multimedia Workloads Will Change Processor Design , 1997, Computer.

[6] Henry G. Dietz,et al. Compiling for SIMD Within a Register , 1998, LCPC.

[7] Saman P. Amarasinghe,et al. Exploiting superword level parallelism with multimedia instruction sets , 2000, PLDI '00.

[8] R. Govindarajan,et al. A Vectorizing Compiler for Multimedia Extensions , 2000, International Journal of Parallel Programming.