Superword-level parallelism in the presence of control flow

In this paper, we describe how to extend the concept of superword-level parallelization (SLP), used for multimedia extension architectures, so that it can be applied in the presence of control flow constructs. Superword-level parallelization involves identifying scalar instructions in a large basic block that perform the same operation, and, if dependences do not prevent it, combining them into a superword operation on a multi-word object. A key insight is that we can use techniques related to optimizations for architectures supporting predicated execution, even for multimedia ISAs that do not provide hardware predication. We derive large basic blocks with predicated instructions to which SLP can be applied. We describe how to minimize overheads for superword predicates and re-introduce control flow for scalar operations. We discuss other extensions to SLP to address common features of real multimedia codes. We present automatically-generated performance results on 8 multimedia codes to demonstrate the power of this approach. We observe speedups ranging from 1.97X to 15.07X as compared to both sequential execution and SLP alone.

[1]  Ruby B. Lee Subword parallelism with MAX-2 , 1996, IEEE Micro.

[2]  Gerald I. Cheong An Optimizer for Multimedia Instruction Sets , 2007 .

[3]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[4]  Jaewook Shin,et al.  Compiler-controlled caching in superword register files for multimedia extension architectures , 2002, Proceedings.International Conference on Parallel Architectures and Compilation Techniques.

[5]  Emmett Witchel,et al.  Increasing and detecting memory address congruence , 2002, Proceedings.International Conference on Parallel Architectures and Compilation Techniques.

[6]  M. Schlansker,et al.  On Predicated Execution , 1991 .

[7]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[8]  James E. Smith,et al.  Vector instruction set support for conditional operations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[9]  Saman P. Amarasinghe,et al.  Exploiting superword level parallelism with multimedia instruction sets , 2000, PLDI '00.

[10]  Ken Kennedy,et al.  Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .

[11]  Chun Chen,et al.  The architecture of the DIVA processing-in-memory chip , 2002, ICS '02.

[12]  Monica S. Lam,et al.  Maximizing Multiprocessor Performance with the SUIF Compiler , 1996, Digit. Tech. J..

[13]  Chang Woo Kang,et al.  Implementation of a 256-bit wideword processor for the data-intensive architecture (DIVA) processing-in-memory (PIM) chip , 2002, Proceedings of the 28th European Solid-State Circuits Conference.

[14]  Jaewook Shin,et al.  Mapping Irregular Applications to DIVA, a PIM-based Data-Intensive Architecture , 1999, ACM/IEEE SC 1999 Conference (SC'99).

[15]  Brad Calder,et al.  Phi-predication for light-weight if-conversion , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..

[16]  Andreas Krall,et al.  Compilation Techniques for Multimedia Processors , 2004, International Journal of Parallel Programming.

[17]  Derek J. DeVries A vectorizing SUIF compiler, implementation and performance , 1997 .

[18]  R. Govindarajan,et al.  A Vectorizing Compiler for Multimedia Extensions , 2000, International Journal of Parallel Programming.

[19]  Ken Kennedy,et al.  Conversion of control dependence to data dependence , 1983, POPL '83.

[20]  Jeanne Ferrante,et al.  On linearizing parallel code , 1985, POPL.

[21]  Aart J. C. Bik,et al.  Automatic Intra-Register Vectorization for the Intel® Architecture , 2002, International Journal of Parallel Programming.

[22]  Scott Mahlke,et al.  Exploiting Instruction Level Parallelism in the Presence of Conditional Branches , 1997 .