Evaluating compiler technology for control-flow optimizations for multimedia extension architectures

This paper addresses how to automatically generate code for multimedia extension architectures in the presence of conditionals. We evaluate the costs and benefits of exploiting branches on the aggregate condition codes associated with the fields of a superword (an aggregate object larger than a machine word) such as the branch-on-any instruction of the AltiVec. Branch-on-superword-condition-codes (BOSCC) instructions allow fast detection of aggregate conditions, an optimization opportunity often found in multimedia applications. This paper presents compiler analyses and techniques for generating efficient parallel code using BOSCC instructions. We evaluate our approach, which has been implemented in the SUIF compiler, through a set of experiments with multimedia benchmarks, and compare it with the default approach previously implemented in our compiler. Our experimental results show that using BOSCC instructions can result in better performance for applications where the aggregate condition codes of a superword often evaluate to the same value.

[1]  Chun Chen,et al.  The architecture of the DIVA processing-in-memory chip , 2002, ICS '02.

[2]  M. Schlansker,et al.  On Predicated Execution , 1991 .

[3]  J. Wawrzynek CNS-1 Architecture Specification A Connectionist Network Supercomputer , 1993 .

[4]  Saman P. Amarasinghe,et al.  Exploiting superword level parallelism with multimedia instruction sets , 2000, PLDI '00.

[5]  Aart J. C. Bik,et al.  Automatic Intra-Register Vectorization for the Intel® Architecture , 2002, International Journal of Parallel Programming.

[6]  Peng Wu,et al.  Vectorization for SIMD architectures with alignment constraints , 2004, PLDI '04.

[7]  Jaewook Shin,et al.  Mapping Irregular Applications to DIVA, a PIM-based Data-Intensive Architecture , 1999, ACM/IEEE SC 1999 Conference (SC'99).

[8]  Rainer Leupers Code selection for media processors with SIMD instructions , 2000, DATE '00.

[9]  Monica S. Lam,et al.  Maximizing Multiprocessor Performance with the SUIF Compiler , 1996, Digit. Tech. J..

[10]  Andreas Krall,et al.  Compilation Techniques for Multimedia Processors , 2004, International Journal of Parallel Programming.

[11]  Jaewook Shin,et al.  Superword-level parallelism in the presence of control flow , 2005, International Symposium on Code Generation and Optimization.

[12]  Ruby B. Lee Subword parallelism with MAX-2 , 1996, IEEE Micro.

[13]  Gerald I. Cheong An Optimizer for Multimedia Instruction Sets , 2007 .

[14]  Derek J. DeVries A vectorizing SUIF compiler, implementation and performance , 1997 .

[15]  R. Leupers Code selection for media processors with SIMD instructions , 2000, Proceedings Design, Automation and Test in Europe Conference and Exhibition 2000 (Cat. No. PR00537).

[16]  Scott Mahlke,et al.  Exploiting Instruction Level Parallelism in the Presence of Conditional Branches , 1997 .

[17]  R. Govindarajan,et al.  A Vectorizing Compiler for Multimedia Extensions , 2000, International Journal of Parallel Programming.