Exploiting mixed SIMD parallelism by reducing data reorganization overhead
暂无分享,去创建一个
[1] Albert Cohen,et al. Polyhedral-Model Guided Loop-Nest Auto-Vectorization , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.
[2] Timothy M. Jones,et al. PSLP: Padded SLP automatic vectorization , 2015, 2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[3] Mahmut T. Kandemir,et al. A compiler framework for extracting superword level parallelism , 2012, PLDI '12.
[4] R. C. Whaley,et al. Vectorization past dependent branches through speculation , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.
[5] Jaewook Shin,et al. Superword-level parallelism in the presence of control flow , 2005, International Symposium on Code Generation and Optimization.
[6] Saman P. Amarasinghe,et al. Exploiting superword level parallelism with multimedia instruction sets , 2000, PLDI '00.
[7] Peng Wu,et al. Vectorization for SIMD architectures with alignment constraints , 2004, PLDI '04.
[8] Peng Zhao,et al. An integrated simdization framework using virtual vectors , 2005, ICS '05.
[9] Jaewook Shin. Introducing Control Flow into Vectorized Code , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).
[10] Ayal Zaks,et al. Auto-vectorization of interleaved data for SIMD , 2006, PLDI '06.
[11] Barbara M. Chapman,et al. Supercompilers for parallel and vector computers , 1990, ACM Press frontier series.
[12] Franz Franchetti,et al. Data Layout Transformation for Stencil Computations on Short-Vector SIMD Architectures , 2011, CC.
[13] Jaewook Shin,et al. Compiler-controlled caching in superword register files for multimedia extension architectures , 2002, Proceedings.International Conference on Parallel Architectures and Compilation Techniques.
[14] Vivek Sarkar,et al. Efficient Selection of Vector Instructions Using Dynamic Programming , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.
[15] Seonggun Kim,et al. Efficient SIMD code generation for irregular kernels , 2012, PPoPP '12.
[16] Gang Ren,et al. Optimizing data permutations for SIMD devices , 2006, PLDI '06.
[17] Scott A. Mahlke,et al. SIMD defragmenter: efficient ILP realization on data-parallel architectures , 2012, ASPLOS XVII.
[18] Ken Kennedy,et al. Automatic translation of FORTRAN programs to vector form , 1987, TOPL.
[19] Sumit Gulwani,et al. From relational verification to SIMD loop synthesis , 2013, PPoPP '13.
[20] David A. Padua,et al. An Evaluation of Vectorizing Compilers , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.
[21] Sebastian Hack,et al. Whole-function vectorization , 2011, International Symposium on Code Generation and Optimization (CGO 2011).
[22] Emmett Witchel,et al. Increasing and detecting memory address congruence , 2002, Proceedings.International Conference on Parallel Architectures and Compilation Techniques.
[23] Richard Veras,et al. When polyhedral transformations meet SIMD code generation , 2013, PLDI.
[24] Timothy M. Jones,et al. Throttling Automatic Vectorization: When Less is More , 2015, 2015 International Conference on Parallel Architecture and Compilation (PACT).