Automatic Vectorization of Interleaved Data Revisited
暂无分享,去创建一个
[1] Mahmut T. Kandemir,et al. A compiler framework for extracting superword level parallelism , 2012, PLDI '12.
[2] Franz Franchetti,et al. A SIMD vectorizing compiler for digital signal processing algorithms , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.
[3] Scott A. Mahlke,et al. SIMD defragmenter: efficient ILP realization on data-parallel architectures , 2012, ASPLOS XVII.
[4] Gang Ren,et al. Optimizing data permutations for SIMD devices , 2006, PLDI '06.
[5] Saman P. Amarasinghe,et al. Exploiting superword level parallelism with multimedia instruction sets , 2000, PLDI '00.
[6] Peng Wu,et al. Vectorization for SIMD architectures with alignment constraints , 2004, PLDI '04.
[7] Peter Kogge,et al. Generation of permutations for SIMD processors , 2005, LCTES '05.
[8] Jaewook Shin,et al. Superword-level parallelism in the presence of control flow , 2005, International Symposium on Code Generation and Optimization.
[9] Alfred V. Aho,et al. Code generation using tree matching and dynamic programming , 1989, ACM Trans. Program. Lang. Syst..
[10] Jack J. Dongarra,et al. Automated empirical optimizations of software and the ATLAS project , 2001, Parallel Comput..
[11] Ruby B. Lee. Accelerating multimedia with enhanced microprocessors , 1995, IEEE Micro.
[12] Kenneth H.Rosen,et al. "Discrete Mathematics and its Applications", 7th Edition, Tata Mc Graw Hill Pub. Co. Ltd., New Delhi, Special Indian Edition, 2011 , 2015 .
[13] ZaksAyal,et al. Auto-vectorization of interleaved data for SIMD , 2006 .
[14] Ayal Zaks,et al. Auto-vectorization of interleaved data for SIMD , 2006, PLDI '06.
[15] Michael Wolfe,et al. High performance compilers for parallel computing , 1995 .
[16] PaduaDavid,et al. Optimizing data permutations for SIMD devices , 2006 .
[17] KudriavtsevAlexei,et al. Generation of permutations for SIMD processors , 2005 .
[18] Kenneth H. Rosen,et al. Discrete Mathematics and its applications , 2000 .
[19] Lizy Kurian John,et al. Bottlenecks in Multimedia Processing with SIMD Style Extensions and Architectural Enhancements , 2003, IEEE Trans. Computers.
[20] Sebastian Hack,et al. The Impact of the SIMD Width on Control-Flow and Memory Divergence , 2014, ACM Trans. Archit. Code Optim..
[21] Charles L. Lawson,et al. Basic Linear Algebra Subprograms for Fortran Usage , 1979, TOMS.
[22] Endong Wang,et al. Intel Math Kernel Library , 2014 .
[23] Jaewook Shin,et al. Compiler-controlled caching in superword register files for multimedia extension architectures , 2002, Proceedings.International Conference on Parallel Architectures and Compilation Techniques.
[24] Vivek Sarkar,et al. Efficient Selection of Vector Instructions Using Dynamic Programming , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.
[25] Albert Cohen,et al. A Polyhedral Approach to Ease the Composition of Program Transformations , 2004, Euro-Par.
[26] Dhananjay M. Dhamdhere,et al. Efficient Retargetable Code Generation Using Bottom-up Tree Pattern Matching , 1990, Comput. Lang..
[27] Yuefan Deng,et al. New trends in high performance computing , 2001, Parallel Computing.
[28] Sebastian Hack,et al. Whole-function vectorization , 2011, International Symposium on Code Generation and Optimization (CGO 2011).
[29] Erez Petrank,et al. New Algorithms for SIMD Alignment , 2007, CC.
[30] David A. Padua,et al. An Evaluation of Vectorizing Compilers , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.
[31] Christopher W. Fraser,et al. BURG: fast optimal instruction selection and tree parsing , 1992, SIGP.
[32] Franz Franchetti,et al. Generating SIMD Vectorized Permutations , 2008, CC.
[33] LiuJun,et al. A compiler framework for extracting superword level parallelism , 2012 .