Compilation Techniques for Multimedia Processors

The huge processing power needed by multimedia applications has led to multimedia extensions in the instruction set of microprocessors which exploit subword parallelism. Examples of these extended instruction sets are the Visual Instruction Set of the UltraSPARC processor, the AltiVec instruction set of the PowerPC processor, the MMX and ISS extensions of the Pentium processors, and the MAX-2 instruction set of the HP PA-RISC processor. Currently, these extensions can only be used by programs written in assembly language, through system libraries or by calling specialized macros in a high-level language. Therefore, these instructions are not used by most applications. We propose two code generation techniques to produce native code using these multimedia extensions for programs written in a high-level language: classical vectorization and vectorization by unrolling. Vectorization by unrolling is simpler than classical vectorization since data dependence analysis is reduced to acyclic control flow graph analysis. Furthermore, we address the problem of unaligned memory accesses. This can be handled by both static analysis and dynamic runtime checking. Preliminary experimental results for a code generator for the UltraSPARC VIS instruction set show that speedups of up to a factor of 4.8 are possible, and that vectorization by unrolling is much simpler but as effective as classical vectorization.

[1]  R. Govindarajan,et al.  A Vectorizing Compiler for Multimedia Extensions , 2000, International Journal of Parallel Programming.

[2]  Craig Hansen MicroUnity's MediaProcessor architecture , 1996, IEEE Micro.

[3]  KennedyKen,et al.  Automatic translation of FORTRAN programs to vector form , 1987 .

[4]  Helmut Emmelmann,et al.  BEG: a generator for efficient back ends , 1989, PLDI '89.

[5]  André Seznec,et al.  Étude des architectures des microprocesseurs MIPS R10000, UltraSPARC et PentiumPro , 1996 .

[6]  Huy Nguyen,et al.  AltiVec/sup TM/: bringing vector technology to the PowerPC/sup TM/ processor family , 1999, 1999 IEEE International Performance, Computing and Communications Conference (Cat. No.99CH36305).

[7]  Susan Horwitz,et al.  Fast and accurate flow-insensitive points-to analysis , 1997, POPL '97.

[8]  Jean-Claude Sogno,et al.  The Janus Test: a hierarchical algorithm for computing direction and distance vectors , 1996, Proceedings of HICSS-29: 29th Hawaii International Conference on System Sciences.

[9]  David F. Bacon,et al.  Compiler transformations for high-performance computing , 1994, CSUR.

[10]  Uri C. Weiser,et al.  MMX technology extension to the Intel architecture , 1996, IEEE Micro.

[11]  Ruby B. Lee Subword parallelism with MAX-2 , 1996, IEEE Micro.

[12]  Ken Kennedy,et al.  Automatic translation of FORTRAN programs to vector form , 1987, TOPL.

[13]  Barbara M. Chapman,et al.  Supercompilers for parallel and vector computers , 1990, ACM Press frontier series.

[14]  Michael Wolfe,et al.  High performance compilers for parallel computing , 1995 .

[15]  Ken Kennedy,et al.  Vector Register Allocation , 1992, IEEE Trans. Computers.