Speculatively vectorized bytecode

Diversity is a confirmed trend of computing systems, which present a complex and moving target to software developers. Virtual machines and just-in-time compilers have been proposed to mitigate the complexity of these systems. They do so by offering a single and stable abstract machine model thereby hiding architectural details from programmers. SIMD capabilities are common among current and expected computing systems. Efficient exploitation of SIMD instructions has become crucial for the performance of many applications. Existing auto-vectorizers operate within traditional static optimizing compilers, and use details about the target architecture when generating SIMD instructions. Unfortunately, auto-vectorizers are currently too complex to be included in a constrained Just-In-Time (JIT) environment. In this paper we propose Vapor SIMD: a speculative approach for effective just-in-time vectorization. Vapor SIMD first applies complex ahead-of-time techniques to vectorize source code and produce bytecode of a standard portable format. Advanced JIT compilers can then quickly tailor this bytecode to exploit SIMD capabilities of appropriate platforms, yielding up to 14.7x and 11.8x speedups on x86 and PowerPC platforms (including JIT-compilation time). JIT compilers can also seamlessly revert to non-vector code, in the absence of SIMD capabilities or in the case of a third-party non-vectorizing JIT compiler, yielding 93% or more of the original performance.

[1]  Charles Consel,et al.  Tempo: specializing systems applications and beyond , 1998, CSUR.

[2]  Scott A. Mahlke,et al.  Liquid SIMD: Abstracting SIMD Hardware using Lightweight Dynamic Mapping , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[3]  Richard Henderson,et al.  Multi-platform auto-vectorization , 2006, International Symposium on Code Generation and Optimization (CGO'06).

[4]  Albert Cohen,et al.  Split Compilation: an Application to Just-in-Time Vectorization , 2007 .

[5]  Pedro Malagón,et al.  SORU: A Reconfigurable Vector Unit for Adaptable Embedded Systems , 2009, ARC.

[6]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[7]  Michael F. P. O'Boyle,et al.  High-Performance Embedded Architecture and Compilation Roadmap , 2007, Trans. High Perform. Embed. Archit. Compil..

[8]  Mateo Valero,et al.  Speculative dynamic vectorization , 2002, Proceedings 29th Annual International Symposium on Computer Architecture.

[9]  Aart J. C. Bik The Software Vectorization Handbook: Apply-ing Multimedia Extensions for Maximum Performance , 2004 .

[10]  Albert Cohen,et al.  Processor virtualization and split compilation for heterogeneous multicore embedded systems , 2008, Design Automation Conference.

[11]  Erven Rohou,et al.  A stack-based internal representation for GCC , 2009 .

[12]  Vikram S. Adve,et al.  Vector LLVA: a virtual vector instruction set for media processing , 2006, VEE '06.

[13]  No License,et al.  Intel ® 64 and IA-32 Architectures Software Developer ’ s Manual Volume 3 A : System Programming Guide , Part 1 , 2006 .

[14]  Aart Johannes Casimir Bik The software vectorization handbook , 2004 .

[15]  Aart J. C. Bik Software Vectorization Handbook, The: Applying Intel Multimedia Extensions for Maximum Performance , 2004 .

[16]  Erven Rohou Portable and Efficient Auto-vectorized Bytecode: a Look at the Interaction between Static and JIT Compilers , 2010 .

[17]  Ahmed El-Mahdy,et al.  Automatic Vectorization Using Dynamic Compilation and Tree Pattern Matching Technique in Jikes RVM , 2009 .

[18]  Matthew Arnold,et al.  A Survey of Adaptive Optimization in Virtual Machines , 2005, Proceedings of the IEEE.