Effective exploitation of SIMD resources in cross-ISA virtualization
暂无分享,去创建一个
Xiaoli Gong | Jin Wu | Wenwen Wang | Jian Dong | Ziyi Zhao | Ruili Fang | Decheng Zuo | Jian Dong | Decheng Zuo | Jin Wu | Ruili Fang | Wenwen Wang | Xiaoli Gong | Ziyi Zhao
[1] Kenneth A. Ross,et al. Rethinking SIMD Vectorization for In-Memory Databases , 2015, SIGMOD Conference.
[2] Super-Node SLP: optimized vectorization for code sequences containing operators and their inverse elements , 2019, CGO 2019.
[3] Mikel Luján,et al. Low overhead dynamic binary translation on ARM , 2017, PLDI.
[4] Kai Li,et al. The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[5] Zhang Jiang,et al. DQEMU: A Scalable Emulator with Retargetable DBT on Distributed Platforms , 2020, ICPP.
[6] Fabrice Bellard,et al. QEMU, a Fast and Portable Dynamic Translator , 2005, USENIX ATC, FREENIX Track.
[7] Albert Cohen,et al. Vapor SIMD: Auto-vectorize once, run everywhere , 2011, International Symposium on Code Generation and Optimization (CGO 2011).
[8] Richard Johnson,et al. The Transmeta Code Morphing#8482; Software: using speculation, recovery, and adaptive retranslation to address real-life challenges , 2003, CGO.
[9] Wu-chun Feng,et al. ASPaS: A Framework for Automatic SIMDization of Parallel Sorting on x86-based Many-core Processors , 2015, ICS.
[10] Michael D. Smith,et al. Persistent Code Caching: Exploiting Code Reuse Across Executions and Applications , 2007, International Symposium on Code Generation and Optimization (CGO'07).
[11] Wei-Chung Hsu,et al. Dynamic translation of structured Loads/Stores and register mapping for architectures with SIMD extensions , 2017, LCTES.
[12] Vasileios Porpodas,et al. SuperGraph-SLP Auto-Vectorization , 2017, 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT).
[13] Stephen McCamant,et al. Enhancing Cross-ISA DBT Through Automatically Learned Translation Rules , 2018, ASPLOS.
[14] Wang Zhenjiang,et al. A Pattern Translation Method for Flags in Binary Translation , 2014 .
[15] Richard Johnson,et al. The Transmeta Code Morphing/spl trade/ Software: using speculation, recovery, and adaptive retranslation to address real-life challenges , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..
[16] Stephen McCamant,et al. Efficient and scalable cross-ISA virtualization of hardware transactional memory , 2020, CGO.
[17] Yunhao Liu,et al. Mobile Gaming on Personal Computers with Direct Android Emulation , 2019, MobiCom.
[18] Bo Huang,et al. Optimizing dynamic binary translation for SIMD instructions , 2006, International Symposium on Code Generation and Optimization (CGO'06).
[19] Alexander Heinecke,et al. Anatomy of High-Performance Deep Learning Convolutions on SIMD Architectures , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.
[20] Wenwen Wang,et al. Improving Dynamically-Generated Code Performance on Dynamic Binary Translators , 2018, VEE.
[21] Wenwen Wang,et al. Unleashing the Power of Learning: An Enhanced Learning-Based Approach for Dynamic Binary Translation , 2019, USENIX Annual Technical Conference.
[22] Binoy Ravindran,et al. Cross-ISA execution of SIMD regions for improved performance , 2019, SYSTOR.
[23] James Tuck,et al. Improving the Effectiveness of Searching for Isomorphic Chains in Superword Level Parallelism , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[24] Lei Zou,et al. Speeding Up Set Intersections in Graph Algorithms using SIMD Instructions , 2018, SIGMOD Conference.
[25] Alaa R. Alameldeen,et al. ZCOMP: Reducing DNN Cross-Layer Memory Footprint Using Vector Extensions , 2019, MICRO.
[26] Dean M. Tullsen,et al. Execution migration in a heterogeneous-ISA chip multiprocessor , 2012, ASPLOS XVII.
[27] Weihua Zhang,et al. More with Less – Deriving More Translation Rules with Less Training Data for DBTs Using Parameterization , 2020, 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[28] Stephen McCamant,et al. A General Persistent Code Caching Framework for Dynamic Binary Translation (DBT) , 2016, USENIX Annual Technical Conference.
[29] Carol Eidt,et al. SIMD support in .NET: abstract and concrete vector types and operations , 2020, CGO.
[30] Decheng Zuo,et al. PerfDBT: Efficient Performance Regression Testing of Dynamic Binary Translation , 2020, 2020 IEEE 38th International Conference on Computer Design (ICCD).
[31] Xiaoli Gong,et al. Enhancing Atomic Instruction Emulation for Cross-ISA Dynamic Binary Translation , 2021, 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[32] Barton P. Miller,et al. The Paradyn Parallel Performance Measurement Tool , 1995, Computer.
[33] Tiark Rompf,et al. SIMD intrinsics on managed language runtimes , 2018, CGO.
[34] Nalini Vasudevan,et al. FlexVec: auto-vectorization for irregular loops , 2016, PLDI.
[35] Ajay Jain,et al. Revec: program rejuvenation through revectorization , 2019, CC.
[36] Nicholas Nethercote,et al. Valgrind: a framework for heavyweight dynamic binary instrumentation , 2007, PLDI '07.
[37] Viktor Leis,et al. Everything You Always Wanted to Know About Compiled and Vectorized Queries But Were Afraid to Ask , 2018, Proc. VLDB Endow..
[38] Wei-Chung Hsu,et al. Exploiting Asymmetric SIMD Register Configurations in ARM-to-x86 Dynamic Binary Translation , 2017, 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT).
[39] Richard Veras,et al. When polyhedral transformations meet SIMD code generation , 2013, PLDI.
[40] Stephen McCamant,et al. Enabling Cross-ISA Offloading for COTS Binaries , 2017, MobiSys.
[41] Harry Wagstaff,et al. A Retargetable System-level DBT Hypervisor , 2019, USENIX Annual Technical Conference.
[42] Wenwen Wang,et al. Helper function inlining in dynamic binary translation , 2021, CC.
[43] Kenneth A. Ross,et al. Implementing database operations using SIMD instructions , 2002, SIGMOD '02.
[44] Derek Bruening,et al. Efficient, transparent, and comprehensive runtime code manipulation , 2004 .