Compiler optimizations for processors with SIMD instructions

To achieve maximum efficiency, modern embedded processors for media applications exploit single instruction multiple data (SIMD) instructions. SIMD instructions provide a form of vectorization where a large machine word is viewed as a vector of subwords and the same operation is performed on all subwords in parallel. Systematic usage of SIMD instructions can significantly improve program performance. With C becoming the dominant language for programming embedded devices, there is a clear need for C compilers that use SIMD instructions whenever appropriate. However, SIMD instructions typically require each memory access to be aligned with the instruction's data access size. Therefore an important problem in designing the compiler is to determine whether a C pointer is aligned, i.e. whether it refers to the beginning of a machine word. In this paper, we describe our SIMD generation algorithm and present an analysis method which determines the alignment of pointers at compile time. The alignment information is used to reduce the number of dynamic alignment checks and the overhead incurred by them. Our method uses an interprocedural analysis which propagates pointer alignment information in function bodies and through function calls. The effectiveness of our method is supported by experimental results which show that in typical programs the alignments of about 50p of the pointers can be statically determined. Copyright © 2006 John Wiley & Sons, Ltd.

[1]  Rainer Leupers,et al.  Code optimization techniques for embedded processors - methods, algorithms, and tools , 2000 .

[2]  Dr. Rainer Leupers Code Optimization Techniques for Embedded Processors , 2000, Springer US.

[3]  Peng Wu,et al.  Efficient SIMD code generation for runtime alignment and length conversion , 2005, International Symposium on Code Generation and Optimization.

[4]  Bjarne Steensgaard,et al.  Points-to analysis in almost linear time , 1996, POPL '96.

[5]  Jong-Deok Choi,et al.  Interprocedural pointer alias analysis , 1999, TOPL.

[6]  Laurie J. Hendren,et al.  Context-sensitive interprocedural points-to analysis in the presence of function pointers , 1994, PLDI '94.

[7]  Saman P. Amarasinghe,et al.  Exploiting superword level parallelism with multimedia instruction sets , 2000, PLDI '00.

[8]  Emmett Witchel,et al.  Increasing and detecting memory address congruence , 2002, Proceedings.International Conference on Parallel Architectures and Compilation Techniques.

[9]  Peng Wu,et al.  Vectorization for SIMD architectures with alignment constraints , 2004, PLDI '04.

[10]  Andreas Krall,et al.  XDSPCORE: a compiler-based configurable digital signal processor , 2004, IEEE Micro.

[11]  Aart J. C. Bik Software Vectorization Handbook, The: Applying Intel Multimedia Extensions for Maximum Performance , 2004 .

[12]  R. Nigel Horspool,et al.  Ultra fast cycle-accurate compiled emulation of inorder pipelined architectures , 2007, J. Syst. Archit..

[13]  Aart J. C. Bik,et al.  Automatic Intra-Register Vectorization for the Intel® Architecture , 2002, International Journal of Parallel Programming.

[14]  Chris Hankin,et al.  Abstract Interpretation of Declarative Languages , 1987 .

[15]  R. Govindarajan,et al.  A Vectorizing Compiler for Multimedia Extensions , 2000, International Journal of Parallel Programming.

[16]  Emmett Witchel,et al.  Techniques for Increasing and Detecting Memory Alignment , 2001 .

[17]  Mark N. Wegman,et al.  Efficiently computing static single assignment form and the control dependence graph , 1991, TOPL.

[18]  Jack W. Davidson,et al.  Memory access coalescing: a technique for eliminating redundant memory accesses , 1994, PLDI '94.

[19]  Andreas Krall,et al.  Compilation Techniques for Multimedia Processors , 2004, International Journal of Parallel Programming.