Auto-vectorization of interleaved data for SIMD
暂无分享,去创建一个
Ayal Zaks | Dorit Nuzman | Ira Rosen | Ira Rosen | A. Zaks | Dorit Nuzman
[1] Michael Wolfe,et al. High performance compilers for parallel computing , 1995 .
[2] Saman P. Amarasinghe,et al. Exploiting superword level parallelism with multimedia instruction sets , 2000, PLDI '00.
[3] Peng Wu,et al. Vectorization for SIMD architectures with alignment constraints , 2004, PLDI '04.
[4] Gang Ren,et al. An empirical study on the vectorization of multimedia applications for multimedia extensions , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.
[5] Hunter Scales,et al. AltiVec Extension to PowerPC Accelerates Media Processing , 2000, IEEE Micro.
[6] Uri C. Weiser,et al. MMX technology extension to the Intel architecture , 1996, IEEE Micro.
[7] Aart J. C. Bik. The Software Vectorization Handbook: Apply-ing Multimedia Extensions for Maximum Performance , 2004 .
[8] Peter Kogge,et al. Generation of permutations for SIMD processors , 2005, LCTES '05.
[9] Gilles Pokam,et al. SWARP: a retargetable preprocessor for multimedia instructions , 2004, Concurr. Comput. Pract. Exp..
[10] Ken Kennedy,et al. Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .
[11] Peng Wu,et al. Efficient SIMD code generation for runtime alignment and length conversion , 2005, International Symposium on Code Generation and Optimization.
[12] Aart Johannes Casimir Bik. The software vectorization handbook , 2004 .
[13] Sameh W. Asaad,et al. An innovative low-power high-performance programmable signal processor for digital communications , 2003, IBM J. Res. Dev..
[14] Kevin B. Smith. Support for the Intel ® Pentium ® 4 Processor with Hyper-Threading Technology in Intel ® 8 . 0 Compilers , 2004 .
[15] Aart J. C. Bik. Software Vectorization Handbook, The: Applying Intel Multimedia Extensions for Maximum Performance , 2004 .
[16] Ken Kennedy,et al. Practical dependence testing , 1991, PLDI '91.
[17] Jason Merrill. Generic and gimple: A new tree represen-tation for entire functions , 2003 .
[18] Matthew Mattina,et al. Tarantula: a vector extension to the alpha architecture , 2002, Proceedings 29th Annual International Symposium on Computer Architecture.
[19] Mateo Valero,et al. Exploiting a new level of DLP in multimedia applications , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.
[20] Diego Novillo. Tree SSA A New Optimization Infrastructure for GCC , 2004 .
[21] H. Peter Hofstee,et al. Introduction to the Cell multiprocessor , 2005, IBM J. Res. Dev..
[22] Jaewook Shin,et al. Compiler-controlled caching in superword register files for multimedia extension architectures , 2002, Proceedings.International Conference on Parallel Architectures and Compilation Techniques.
[23] Andreas Krall,et al. Pointer Alignment Analysis for Processors with SIMD Instructions , 2003 .
[24] Gang Ren,et al. Optimizing data permutations for SIMD devices , 2006, PLDI '06.
[25] Lizy Kurian John,et al. Exploiting SIMD parallelism in DSP and multimedia algorithms using the AltiVec technology , 1999, ICS '99.
[26] Jaewook Shin,et al. Superword-level parallelism in the presence of control flow , 2005, International Symposium on Code Generation and Optimization.
[27] Ayal Zaks,et al. Vectorizing for a SIMdD DSP architecture , 2003, CASES '03.
[28] Albert Cohen,et al. Induction Variable Analysis with Delayed Abstractions , 2005, HiPEAC.
[29] John A. Gunnels,et al. A high-performance SIMD floating point unit for BlueGene/L: architecture, compilation, and algorithm design , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..
[30] Richard Henderson,et al. Multi-platform auto-vectorization , 2006, International Symposium on Code Generation and Optimization (CGO'06).
[31] Gang Ren,et al. A Preliminary Study on the Vectorization of Multimedia Applications for Multimedia Extensions , 2003, LCPC.
[32] Aart J. C. Bik,et al. Efficient Exploitation of Parallelism on Pentium III and Pentium 4 Processor-Based Systems , 2001 .
[33] Franz Franchetti,et al. Vectorization techniques for the Blue Gene/L double FPU , 2005, IBM J. Res. Dev..
[34] Lizy Kurian John,et al. Bottlenecks in Multimedia Processing with SIMD Style Extensions and Architectural Enhancements , 2003, IEEE Trans. Computers.
[35] Krste Asanovic,et al. Torrent Architecture Manual , 1997 .
[36] PokamGilles,et al. SWARP: a retargetable preprocessor for multimedia instructions , 2004 .