Efficient and retargetable SIMD translation in a dynamic binary translator
暂无分享,去创建一个
Wei-Chung Hsu | Ding-Yong Hong | Jan-Jan Wu | Sheng-Yu Fu | Yu-Ping Liu | W. Hsu | Jan-Jan Wu | Ding-Yong Hong | Sheng-Yu Fu | Yu-Ping Liu
[1] Fred Chow. Intermediate C. Representation. , 2013 .
[2] Philippe Clauss,et al. Runtime Vectorization Transformations of Binary Code , 2017, International Journal of Parallel Programming.
[3] Cliff Click,et al. A Simple Graph-Based Intermediate Representation , 1995, Intermediate Representations Workshop.
[4] Hanspeter Mössenböck,et al. An intermediate representation for speculative optimizations in a dynamic compiler , 2013, VMIL '13.
[5] James E. Smith,et al. Virtual machines - versatile platforms for systems and processes , 2005 .
[6] Hanspeter Mössenböck,et al. An experimental study of the influence of dynamic compiler optimizations on Scala performance , 2013, SCALA@ECOOP.
[7] Bo Huang,et al. Optimizing dynamic binary translation for SIMD instructions , 2006, International Symposium on Code Generation and Optimization (CGO'06).
[8] Fabrice Bellard,et al. QEMU, a Fast and Portable Dynamic Translator , 2005, USENIX ATC, FREENIX Track.
[9] Saman P. Amarasinghe,et al. Exploiting superword level parallelism with multimedia instruction sets , 2000, PLDI '00.
[10] Peng Wu,et al. Vectorization for SIMD architectures with alignment constraints , 2004, PLDI '04.
[11] J. E. Smith,et al. FUTURE SUPERSCALAR PROCESSORS BASED ON INSTRUCTION COMPOUNDING , 2007 .
[12] Yun Wang,et al. IA-32 execution layer: a two-phase dynamic translator designed to support IA-32 applications on Itanium/spl reg/-based systems , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..
[13] Albert Cohen,et al. Vapor SIMD: Auto-vectorize once, run everywhere , 2011, International Symposium on Code Generation and Optimization (CGO 2011).
[14] Hao Zhou,et al. Exploiting mixed SIMD parallelism by reducing data reorganization overhead , 2016, 2016 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[15] D. Woolley. The White Paper. , 1972, British medical journal.
[16] Wuu Yang,et al. Translating the ARM Neon and VFP instructions in a binary translator , 2016, Softw. Pract. Exp..
[17] Richard Henderson,et al. Multi-platform auto-vectorization , 2006, International Symposium on Code Generation and Optimization (CGO'06).
[18] Rajeev Barua,et al. Automatic Parallelization in a Binary Rewriter , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.
[19] Fred Chow. Intermediate representation , 2013, CACM.
[20] Franz Franchetti,et al. SPIRAL: Code Generation for DSP Transforms , 2005, Proceedings of the IEEE.
[21] Peng Wu,et al. Efficient SIMD code generation for runtime alignment and length conversion , 2005, International Symposium on Code Generation and Optimization.
[22] Jason Merrill. Generic and gimple: A new tree represen-tation for entire functions , 2003 .
[23] Seonggun Kim,et al. Efficient SIMD code generation for irregular kernels , 2012, PPoPP '12.
[24] Gang Ren,et al. Optimizing data permutations for SIMD devices , 2006, PLDI '06.
[25] Lizy Kurian John,et al. Exploiting SIMD parallelism in DSP and multimedia algorithms using the AltiVec technology , 1999, ICS '99.
[26] Minwoo Jang,et al. The performance analysis of ARM NEON technology for mobile platforms , 2011, RACS.
[27] Chien-Min Wang,et al. HQEMU: a multi-threaded and retargetable dynamic binary translator on multicores , 2012, CGO '12.
[28] Hao Zhou,et al. A Compiler Approach for Exploiting Partial SIMD Parallelism , 2016, ACM Trans. Archit. Code Optim..
[29] Ayal Zaks,et al. Auto-vectorization of interleaved data for SIMD , 2006, PLDI '06.
[30] Michael Gschwind,et al. Dynamic Binary Translation and Optimization , 2001, IEEE Trans. Computers.
[31] Hao Zhou,et al. Loop-oriented array- and field-sensitive pointer analysis for automatic SIMD vectorization , 2016, LCTES.
[32] Ahmed Zekri,et al. ENHANCING THE MATRIX TRANSPOSE OPERATION USING INTEL AVX INSTRUCTION SET EXTENSION , 2014 .
[33] Nalini Vasudevan,et al. FlexVec: auto-vectorization for irregular loops , 2016, PLDI.
[34] David Seal,et al. ARM Architecture Reference Manual , 2001 .
[35] Cindy Zheng,et al. PA-RISC to IA-64: Transparent Execution, No Recompilation , 2000, Computer.
[36] Wei-Chung Hsu,et al. Improving SIMD code generation in QEMU , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[37] Yun Wang,et al. IA-32 Execution Layer: a two-phase dynamic translator designed to support IA-32 applications on Itanium-based systems , 2003, MICRO.
[38] Sebastian Hack,et al. Whole-function vectorization , 2011, International Symposium on Code Generation and Optimization (CGO 2011).
[39] Albert Cohen,et al. Polyhedral-Model Guided Loop-Nest Auto-Vectorization , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.
[40] Timothy M. Jones,et al. PSLP: Padded SLP automatic vectorization , 2015, 2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[41] Junaid Shuja,et al. SIMDOM: A framework for SIMD instruction translation and offloading in heterogeneous mobile architectures , 2018, Trans. Emerg. Telecommun. Technol..
[42] Wei-Chung Hsu,et al. SIMD Code Translation in an Enhanced HQEMU , 2015, 2015 IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS).
[43] Ayal Zaks,et al. Outer-loop vectorization - revisited for short SIMD architectures , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[44] Jack J. Dongarra,et al. Vectorizing compilers: a test suite and results , 1988, Proceedings. SUPERCOMPUTING '88.
[45] Wei-Chung Hsu,et al. Exploiting Longer SIMD Lanes in Dynamic Binary Translation , 2016, 2016 IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS).
[46] David Gregg,et al. Automatic Vectorization of Interleaved Data Revisited , 2015, ACM Trans. Archit. Code Optim..