Data layout transformation for structure vectorization on SIMD architectures

Structure references are commonly-used at the core of applications in a multitude of domains such as image processing, signal processing, especially the scientific and engineering applications. SIMD instruction sets, as SSE, AVX, AltiVec and 3DNow, provide a promising and widely available avenue for enhancing performance on modern processors. However existing memory accessing shackles limit the achieved performance for structure reference on modern SIMD architectures. In this paper, we propose a novel data layout transformation technology that addresses the accessing obstacles, along with a static analysis technique for detecting the legal loops in where this transformation is suitable. And this approach is implemented in the Optimizing Compiler Open64. The experimental results show that the proposed method can translate application with structure access into vectorizable codes, thereby advancing the execution efficiency adequately.

[1]  Aart J. C. Bik,et al.  Automatic Intra-Register Vectorization for the Intel® Architecture , 2002, International Journal of Parallel Programming.

[2]  Ayal Zaks,et al.  Auto-vectorization of interleaved data for SIMD , 2006, PLDI '06.

[3]  Ayal Zaks,et al.  Vectorizing for a SIMdD DSP architecture , 2003, CASES '03.

[4]  Reinhard Wilhelm,et al.  Parametric shape analysis via 3-valued logic , 1999, POPL '99.

[5]  Franz Franchetti,et al.  Data Layout Transformation for Stencil Computations on Short-Vector SIMD Architectures , 2011, CC.

[6]  R. Govindarajan,et al.  Region Based Structure Layout Optimization by Selective Data Copying , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.

[7]  John L. Henning SPEC CPU2006 benchmark descriptions , 2006, CARN.

[8]  Bashir M. Al-Hashimi,et al.  Advanced SIMD: Extending the reach of contemporary SIMD architectures , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[9]  Mahmut T. Kandemir,et al.  A Linear Algebra Framework for Automatic Determination of Optimal Data Layouts , 1999, IEEE Trans. Parallel Distributed Syst..

[10]  Saman P. Amarasinghe,et al.  Exploiting superword level parallelism with multimedia instruction sets , 2000, PLDI '00.

[11]  Ayal Zaks,et al.  Outer-loop vectorization - revisited for short SIMD architectures , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[12]  Emmett Witchel,et al.  Increasing and detecting memory address congruence , 2002, Proceedings.International Conference on Parallel Architectures and Compilation Techniques.

[13]  David R. Kaeli,et al.  Data transformations enabling loop vectorization on multithreaded data parallel architectures , 2010, PPoPP '10.

[14]  Gang Ren,et al.  Optimizing data permutations for SIMD devices , 2006, PLDI '06.

[15]  Ken Kennedy,et al.  Automatic translation of FORTRAN programs to vector form , 1987, TOPL.

[16]  Bo-Cheng Lai,et al.  Automatic Data Layout Transformation for Heterogeneous Many-Core Systems , 2014, NPC.

[17]  Michael F. P. O'Boyle,et al.  Nonsingular Data Transformations: Definition, Validity, and Applications , 1999, International Journal of Parallel Programming.