A Compiler Approach for Exploiting Partial SIMD Parallelism
暂无分享,去创建一个
[1] Albert Cohen,et al. Polyhedral-Model Guided Loop-Nest Auto-Vectorization , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.
[2] Timothy M. Jones,et al. PSLP: Padded SLP automatic vectorization , 2015, 2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[3] Jim Jeffers,et al. N -Body simulation , 2016 .
[4] Emmett Witchel,et al. Increasing and detecting memory address congruence , 2002, Proceedings.International Conference on Parallel Architectures and Compilation Techniques.
[5] Jaewook Shin,et al. Superword-level parallelism in the presence of control flow , 2005, International Symposium on Code Generation and Optimization.
[6] Michael Wolfe,et al. More iteration space tiling , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).
[7] Jaewook Shin,et al. Compiler-controlled caching in superword register files for multimedia extension architectures , 2002, Proceedings.International Conference on Parallel Architectures and Compilation Techniques.
[8] Saman P. Amarasinghe,et al. Exploiting superword level parallelism with multimedia instruction sets , 2000, PLDI '00.
[9] Peng Wu,et al. Vectorization for SIMD architectures with alignment constraints , 2004, PLDI '04.
[10] Jaewook Shin. Introducing Control Flow into Vectorized Code , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).
[11] Michael Goldfarb,et al. Automatic vectorization of tree traversals , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.
[12] James R. Larus,et al. SIMD parallelization of applications that traverse irregular data structures , 2013, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[13] Richard Veras,et al. When polyhedral transformations meet SIMD code generation , 2013, PLDI.
[14] David A. Padua,et al. An Evaluation of Vectorizing Compilers , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.
[15] Ayal Zaks,et al. Auto-vectorization of interleaved data for SIMD , 2006, PLDI '06.
[16] Vivek Sarkar,et al. Efficient Selection of Vector Instructions Using Dynamic Programming , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.
[17] Barbara M. Chapman,et al. Supercompilers for parallel and vector computers , 1990, ACM Press frontier series.
[18] Jingling Xue,et al. Region-Based Selective Flow-Sensitive Pointer Analysis , 2014, SAS.
[19] Hao Zhou,et al. Exploiting mixed SIMD parallelism by reducing data reorganization overhead , 2016, 2016 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[20] Jingling Xue,et al. Loop Tiling for Parallelism , 2000, Kluwer International Series in Engineering and Computer Science.
[21] Franz Franchetti,et al. SIMD Vectorization of Straight Line FFT Code , 2003, Euro-Par.
[22] Scott A. Mahlke,et al. SIMD defragmenter: efficient ILP realization on data-parallel architectures , 2012, ASPLOS XVII.
[23] Ralf Karrenberg,et al. Automatic SIMD Vectorization of SSA-based Control Flow Graphs , 2015, Springer Fachmedien Wiesbaden.
[24] Seonggun Kim,et al. Efficient SIMD code generation for irregular kernels , 2012, PPoPP '12.
[25] Gang Ren,et al. Optimizing data permutations for SIMD devices , 2006, PLDI '06.
[26] Aart J. C. Bik,et al. Automatic Intra-Register Vectorization for the Intel® Architecture , 2002, International Journal of Parallel Programming.
[27] Peng Zhao,et al. An integrated simdization framework using virtual vectors , 2005, ICS '05.
[28] Albert Cohen,et al. Vapor SIMD: Auto-vectorize once, run everywhere , 2011, International Symposium on Code Generation and Optimization (CGO 2011).
[29] R. C. Whaley,et al. Vectorization past dependent branches through speculation , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.
[30] Jingling Xue,et al. Code tiling for improving the cache performance of PDE solvers , 2003, 2003 International Conference on Parallel Processing, 2003. Proceedings..
[31] Mahmut T. Kandemir,et al. A compiler framework for extracting superword level parallelism , 2012, PLDI '12.
[32] Mithuna Thottethodi,et al. Nonlinear array layouts for hierarchical memory systems , 1999, ICS '99.
[33] Sebastian Hack,et al. Whole-function vectorization , 2011, International Symposium on Code Generation and Optimization (CGO 2011).
[34] Franz Franchetti,et al. Automatic SIMD vectorization of fast fourier transforms for the larrabee and AVX instruction sets , 2011, ICS '11.
[35] R. Govindarajan,et al. A Vectorizing Compiler for Multimedia Extensions , 2000, International Journal of Parallel Programming.
[36] Karthikeyan Sankaralingam,et al. Breaking SIMD shackles with an exposed flexible microarchitecture and the access execute PDG , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.