Convergence and Scalarization in Whole Function Vectorization

When implementing SPMD programs on multi core platforms, whole function vectorization is an important optimization method. SPMD program has drawback that lots of instructions across multi threads are redundant which is sustained in vectorization. This paper proposes to alleviate this overhead by detecting scalar operations and extract them out in vectorization instructions. An algorithm is designed to deal with control flow and data flow synchronously in which convergent and invariance analysis is employed to statically identify convergent execution and invariant values or instructions. Our algorithm is effectively on implementing SPMD programs on multi core platforms. The experiments show our method could improve the execution efficiency by 13.3%.

[1]  Sam S. Stone,et al.  MCUDA: An Efficient Implementation of CUDA Kernels on Multi-cores , 2011 .

[2]  Sudhakar Yalamanchili,et al.  Ocelot: A dynamic optimization framework for bulk-synchronous applications in heterogeneous systems , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[3]  Sebastian Hack,et al.  Whole-function vectorization , 2011, International Symposium on Code Generation and Optimization (CGO 2011).

[4]  Fernando Magno Quintão Pereira,et al.  Divergence Analysis and Optimizations , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[5]  Mike Murphy,et al.  Efficient compilation of fine-grained SPMD-threaded programs for multicore CPUs , 2010, CGO '10.

[6]  Krste Asanovic,et al.  Convergence and scalarization for data-parallel architectures , 2013, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).