Boosting the Performance of Multimedia Applications Using SIMD Instructions

Modern processors' multimedia extensions (MME) provide SIMD ISAs to boost the performance of typical operations in multimedia applications. However, automatic vectorization support for them is not very mature. The key difficulty is how to vectorize those SIMD-ISA-supported idioms in source code in an efficient and general way. In this paper, we introduce a powerful and ex-tendable recognition engine to solve this problem, which only needs a small amount of rules to recognize many such idioms and generate efficient SIMD in-structions. We integrated this engine into the classic vectorization framework and obtained very good performance speedup for some real-life applications.

[1]  R. Govindarajan,et al.  A Vectorizing Compiler for Multimedia Extensions , 2000, International Journal of Parallel Programming.

[2]  Nigel P. Topham,et al.  Performance of the decoupled ACRI-1 architecture: the perfect club , 1995, HPCN Europe.

[3]  Ken Kennedy,et al.  Conversion of control dependence to data dependence , 1983, POPL '83.

[4]  Saman P. Amarasinghe,et al.  Exploiting superword level parallelism with multimedia instruction sets , 2000, PLDI '00.

[5]  Ken Kennedy,et al.  Automatic translation of FORTRAN programs to vector form , 1987, TOPL.

[6]  Gang Ren,et al.  A Preliminary Study on the Vectorization of Multimedia Applications for Multimedia Extensions , 2003, LCPC.

[7]  Alan Jay Smith,et al.  Design and characterization of the Berkeley multimedia workload , 2002, Multimedia Systems.

[8]  Gerald I. Cheong An Optimizer for Multimedia Instruction Sets , 2007 .

[9]  Aart J. C. Bik,et al.  Automatic Detection of Saturation and Clipping Idioms , 2002, LCPC.

[10]  Gang Ren,et al.  An empirical study on the vectorization of multimedia applications for multimedia extensions , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[11]  Henry G. Dietz,et al.  Compiling for SIMD Within a Register , 1998, LCPC.

[12]  J. Liang,et al.  Designing the Agassiz Compiler for Concurrent Multithreaded Architectures , 1999, LCPC.

[13]  Mark Stephenson,et al.  Bidwidth analysis with application to silicon compilation , 2000, PLDI '00.

[14]  Andreas Krall,et al.  Compilation Techniques for Multimedia Processors , 2004, International Journal of Parallel Programming.

[15]  Alan Jay Smith,et al.  Measuring the Performance of Multimedia Instruction Sets , 2002, IEEE Trans. Computers.

[16]  Kurt Keutzer,et al.  A text-compression-based method for code size minimization in embedded systems , 1999, TODE.

[17]  Henk Corporaal,et al.  Transformatiing and Parallelizing ANSI C Programs using Pattern Recognition , 1999, HPCN Europe.

[18]  David A. Padua,et al.  Advanced compiler optimizations for supercomputers , 1986, CACM.