IDISA+: A portable model for high performance SIMD programming

Most of today’s commodity processors have single-instruction multiple-data(SIMD) instructions built in and provide SIMD within a register. However, different processor vendors tend to have different SIMD instruction sets which poses significant challenges to cross-platform SIMD programming. This thesis proposes a model called IDISA+ to overcome the compatibility issues and enable portable SIMD programming. There are more than 60 well-selected SIMD operations defined in the model, which are believed to support a broad range of applications. We have implemented the model as a toolkit with two components, a code generator for producing portable libraries and a test suite for both correctness and performance analysis on the libraries. For performance concerns, our model uses a least instruction count mechanism to select the best among implementation alternatives of library routines. The experimental results demonstrate the effectiveness of the generator and show that generated libraries in our model perform better than hand-tuned libraries.

[1]  Joel Falcou,et al.  E.V.E., An Object Oriented SIMD Library , 2005, Scalable Comput. Pract. Exp..

[2]  Vikram S. Adve,et al.  The LLVM Instruction Set and Compilation Strategy , 2002 .

[3]  David A. Patterson,et al.  Computer Organization And Design: The Hardware/Software Interface , 1993 .

[4]  Uri Weiser,et al.  MMXTM Technology Architecture Overview , 1997 .

[5]  Vikram S. Adve,et al.  Vector LLVA: a virtual vector instruction set for media processing , 2006, VEE '06.

[6]  Joel Falcou,et al.  An object oriented SIMD library. , 2005 .

[7]  Ruby Lee Effectiveness of the MAX-2 Multimedia Extensions for PA-RISC 2.0 Processors , 2013 .

[8]  Saman P. Amarasinghe,et al.  Exploiting superword level parallelism with multimedia instruction sets , 2000, PLDI '00.

[9]  Henry G. Dietz,et al.  General-purpose simd within a register: parallel processing on consumer microprocessors , 2003 .

[10]  Ehsan Amiri,et al.  Parallel Scanning with Bitstream Addition: An XML Case Study , 2011, Euro-Par.

[11]  Robert D. Cameron,et al.  Architectural support for SWAR text processing with parallel bit streams: the inductive doubling principle , 2009, ASPLOS.

[12]  Ruby B. Lee,et al.  64-bit and multimedia extensions in the PA-RISC 2.0 architecture , 1996, COMPCON '96. Technologies for the Information Superhighway Digest of Papers.

[13]  James Abel,et al.  Applications Tuning for Streaming SIMD Extensions , 1999 .

[14]  R. Nigel Horspool,et al.  Compiler optimizations for processors with SIMD instructions , 2007, Softw. Pract. Exp..

[15]  Donald J. Patterson,et al.  Computer organization and design: the hardware-software interface (appendix a , 1993 .

[16]  Ariel Ortiz Ramirez An Overview of Intel's MMX Technology , 1999 .

[17]  Mary Hall,et al.  Compiler optimizations for architectures supporting superword-level parallelism , 2005 .

[18]  Samuel Larsen,et al.  Compilation techniques for short-vector instructions , 2006 .

[19]  Ruby B. Lee Accelerating multimedia with enhanced microprocessors , 1995, IEEE Micro.

[20]  Scott A. Mahlke,et al.  Liquid SIMD: Abstracting SIMD Hardware using Lightweight Dynamic Mapping , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[21]  Robert D. Cameron,et al.  High performance XML parsing using parallel bit stream technology , 2008, CASCON '08.

[22]  Henry G. Dietz,et al.  Compiling for SIMD Within a Register , 1998, LCPC.

[23]  Vikram S. Adve,et al.  The LLVM Compiler Framework and Infrastructure Tutorial , 2004, LCPC.

[24]  Shreekant S. Thakkar,et al.  Internet Streaming SIMD Extensions , 1999, Computer.

[25]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[26]  Miriam Leeser,et al.  Multimedia Macros for Portable Optimized Programs , 2004 .