Outer-loop vectorization - revisited for short SIMD architectures
暂无分享,去创建一个
[1] Francky Catthoor,et al. Pack Transposition: Enhancing Superword Level Parallelism Exploitation , 2005, PARCO.
[2] Saman P. Amarasinghe,et al. Exploiting superword level parallelism with multimedia instruction sets , 2000, PLDI '00.
[3] Jaewook Shin,et al. Compiler-controlled caching in superword register files for multimedia extension architectures , 2002, Proceedings.International Conference on Parallel Architectures and Compilation Techniques.
[4] Peng Wu,et al. Vectorization for SIMD architectures with alignment constraints , 2004, PLDI '04.
[5] Emmett Witchel,et al. Increasing and detecting memory address congruence , 2002, Proceedings.International Conference on Parallel Architectures and Compilation Techniques.
[6] References , 1971 .
[7] Krste Asanovic,et al. Compiling for vector-thread architectures , 2008, CGO '08.
[8] Ayal Zaks,et al. Auto-vectorization of interleaved data for SIMD , 2006, PLDI '06.
[9] Michael Wolfe,et al. High performance compilers for parallel computing , 1995 .
[10] Ayal Zaks,et al. Compiling for an indirect vector register architecture , 2008, CF '08.
[11] Peng Zhao,et al. An integrated simdization framework using virtual vectors , 2005, ICS '05.
[12] Christoforos E. Kozyrakis,et al. Vector vs. superscalar and VLIW architectures for embedded multimedia benchmarks , 2002, MICRO.
[13] Viet Nhu Ngo. Parallel loop transformation techniques for vector-based multiprocessor systems , 1995 .
[14] Francisco Tirado,et al. Improving superword level parallelism support in modern compilers , 2005, 2005 Third IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS'05).
[15] Allen,et al. Optimizing Compilers for Modern Architectures , 2004 .
[16] Samuel Williams,et al. Hardware/compiler codevelopment for an embedded media processor , 2001, Proc. IEEE.
[17] Ayal Zaks,et al. Vectorizing for a SIMdD DSP architecture , 2003, CASES '03.
[18] Ken Kennedy,et al. PFC: A Program to Convert Fortran to Parallel Form , 1982 .
[19] Jaewook Shin,et al. Superword-level parallelism in the presence of control flow , 2005, International Symposium on Code Generation and Optimization.
[20] Ben Juurlink,et al. Efficient Vectorization of the FIR Filter Asadollah , 2005 .
[21] Mateo Valero,et al. Exploiting a new level of DLP in multimedia applications , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.
[22] H. Peter Hofstee,et al. Introduction to the Cell multiprocessor , 2005, IBM J. Res. Dev..
[23] Randolph G. Scarborough,et al. A Vectorizing Fortran Compiler , 1986, IBM J. Res. Dev..
[24] K. N. Dollman,et al. - 1 , 1743 .
[25] Richard Henderson,et al. Multi-platform auto-vectorization , 2006, International Symposium on Code Generation and Optimization (CGO'06).
[26] Ken Kennedy,et al. Automatic translation of FORTRAN programs to vector form , 1987, TOPL.
[27] Ken Kennedy,et al. Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .
[28] Peng Wu,et al. Efficient SIMD code generation for runtime alignment and length conversion , 2005, International Symposium on Code Generation and Optimization.
[29] Aart Johannes Casimir Bik. The software vectorization handbook , 2004 .
[30] Kevin B. Smith. Support for the Intel ® Pentium ® 4 Processor with Hyper-Threading Technology in Intel ® 8 . 0 Compilers , 2004 .
[31] Aart J. C. Bik,et al. Automatic Intra-Register Vectorization for the Intel® Architecture , 2002, International Journal of Parallel Programming.
[32] Aart J. C. Bik. The Software Vectorization Handbook: Apply-ing Multimedia Extensions for Maximum Performance , 2004 .
[33] Aart J. C. Bik. Software Vectorization Handbook, The: Applying Intel Multimedia Extensions for Maximum Performance , 2004 .
[34] Paul B. Schneck,et al. Automatic recognition of vector and parallel operations in a higher level language , 1972, SIGP.
[35] Gang Ren,et al. A Preliminary Study on the Vectorization of Multimedia Applications for Multimedia Extensions , 2003, LCPC.
[36] Aart J. C. Bik,et al. Efficient Exploitation of Parallelism on Pentium III and Pentium 4 Processor-Based Systems , 2001 .
[37] Yoshitoshi Kunieda,et al. V-Pascal: An automatic vectorizing compiler for Pascal with no language extensions , 1988, Supercomputing '88.