Permutation optimization for SIMD devices
暂无分享,去创建一个
Li Shen | Zhiying Wang | Libo Huang | Zhiying Wang | Libo Huang | Li Shen
[1] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .
[2] Vladimir M. Pentkovski,et al. Implementing Streaming SIMD Extensions on the Pentium III Processor , 2000, IEEE Micro.
[3] Peter Kogge,et al. Generation of permutations for SIMD processors , 2005, LCTES '05.
[4] Ruby B. Lee. Multimedia extensions for general-purpose processors , 1997, 1997 IEEE Workshop on Signal Processing Systems. SiPS 97 Design and Implementation formerly VLSI Signal Processing.
[5] Stamatis Vassiliadis,et al. Performance Impact of Misaligned Accesses in SIMD Extensions , 2006 .
[6] Emmett Witchel,et al. Increasing and detecting memory address congruence , 2002, Proceedings.International Conference on Parallel Architectures and Compilation Techniques.
[7] Todd M. Austin,et al. The SimpleScalar tool set, version 2.0 , 1997, CARN.
[8] Saman P. Amarasinghe,et al. Exploiting superword level parallelism with multimedia instruction sets , 2000, PLDI '00.
[9] Peng Wu,et al. Vectorization for SIMD architectures with alignment constraints , 2004, PLDI '04.
[10] Gang Ren,et al. Optimizing data permutations for SIMD devices , 2006, PLDI '06.
[11] Stamatis Vassiliadis,et al. Matrix register file and extended subwords: two techniques for embedded media processors , 2005, CF '05.
[12] Ayal Zaks,et al. Vectorizing for a SIMdD DSP architecture , 2003, CASES '03.