Implicit Data Permutation for SIMD Devices
暂无分享,去创建一个
[1] Francisco Tirado,et al. Improving superword level parallelism support in modern compilers , 2005, 2005 Third IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS'05).
[2] Ruby B. Lee. Subword permutation instructions for two-dimensional multimedia processing in MicroSIMD architectures , 2000, Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors.
[3] Stamatis Vassiliadis,et al. Matrix register file and extended subwords: two techniques for embedded media processors , 2005, CF '05.
[4] Jaewook Shin,et al. Superword-level parallelism in the presence of control flow , 2005, International Symposium on Code Generation and Optimization.
[5] Ayal Zaks,et al. Auto-vectorization of interleaved data for SIMD , 2006, PLDI '06.
[6] Wonyong Sung,et al. An FPGA based SIMD processor with a vector memory unit , 2006, 2006 IEEE International Symposium on Circuits and Systems.
[7] Peter Kogge,et al. Generation of permutations for SIMD processors , 2005, LCTES '05.
[8] Franz Franchetti,et al. Efficient Utilization of SIMD Extensions , 2005, Proceedings of the IEEE.
[9] Chun Chen,et al. Model-Guided Empirical Optimization for Multimedia Extension Architectures: A Case Study , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.
[10] Ayal Zaks,et al. Vectorizing for a SIMdD DSP architecture , 2003, CASES '03.
[11] Emmett Witchel,et al. Increasing and detecting memory address congruence , 2002, Proceedings.International Conference on Parallel Architectures and Compilation Techniques.
[12] Erik Lindholm,et al. A user-programmable vertex engine , 2001, SIGGRAPH.
[13] Francky Catthoor,et al. Pack Transposition: Enhancing Superword Level Parallelism Exploitation , 2005, PARCO.
[14] Saman P. Amarasinghe,et al. Exploiting superword level parallelism with multimedia instruction sets , 2000, PLDI '00.
[15] Peng Wu,et al. Vectorization for SIMD architectures with alignment constraints , 2004, PLDI '04.
[16] Gang Ren,et al. Optimizing data permutations for SIMD devices , 2006, PLDI '06.
[17] Michael Gschwind,et al. Optimizing Compiler for the CELL Processor , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).
[18] Jaewook Shin,et al. Compiler-controlled caching in superword register files for multimedia extension architectures , 2002, Proceedings.International Conference on Parallel Architectures and Compilation Techniques.
[19] Xiaobo Sharon Hu,et al. Linear-time matrix transpose algorithms using vector register file with diagonal registers , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.