Automatic Discovery of Coarse-Grained Parallelism in Media Applications

With the increasing use of multi-core microprocessors and hardware accelerators in embedded media processing systems, there is an increasing need to discover coarse-grained parallelism in media applications written in C and C++. Common versions of these codes use a pointer-heavy, sequential programming model to implement algorithms with high levels of inherent parallelism. The lack of automated tools capable of discovering this parallelism has hampered the productivity of parallel programmers and application-specific hardware designers, as well as inhibited the development of automatic parallelizing compilers. Automatic discovery is challenging due to shifts in the prevalent programming languages, scalability problems of analysis techniques, and the lack of experimental research in combining the numerous analyses necessary to achieve a clear view of the relations among memory accesses in complex programs. This paper is based on a coherent prototype system designed to automatically find multiple levels of coarse-grained parallelism. It visits several of the key analyses that are necessary to discover parallelism in contemporary media applications, distinguishing those that perform satisfactorily at this time from those that do not yet have practical, scalable solutions. We show that, contrary to common belief, a compiler with a strong, synergistic portfolio of modern analysis capabilities can automatically discover a very substantial amount of coarse-grained parallelism in complex media applications such as an MPEG-4 encoder. These results suggest that an automatic coarse-grained parallelism discovery tool can be built to greatly enhance the software and hardware development processes of future embedded media processing systems.

[1]  Erik Matthew Nystrom Fulcra Pointer Analysis Framework , 2005 .

[2]  Ken Kennedy,et al.  Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .

[3]  William H. Harrison,et al.  Compiler Analysis of the Value Ranges for Variables , 1977, IEEE Transactions on Software Engineering.

[4]  Michael Hind,et al.  Evaluating the effectiveness of pointer alias analyses , 2001, Sci. Comput. Program..

[5]  William Pugh,et al.  The Omega test: A fast and practical integer programming algorithm for dependence analysis , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[6]  Hong-Seok Kim,et al.  Importance of heap specialization in pointer analysis , 2004, PASTE '04.

[7]  Susan Horwitz,et al.  Using static single assignment form to improve flow-insensitive pointer analysis , 1998, PLDI '98.

[8]  Michael Hind,et al.  Pointer analysis: haven't we solved this problem yet? , 2001, PASTE '01.

[9]  Michael Voss,et al.  Dynamically Adaptive Parallel Programs , 1999, ISHPC.

[10]  Michael Wolfe,et al.  Beyond induction variables: detecting and classifying sequences using a demand-driven SSA form , 1995, TOPL.

[11]  Vivek Sarkar,et al.  Compilation techniques for parallel systems , 1999, Parallel Comput..

[12]  Lawrence Rauchwerger,et al.  Polaris: The Next Generation in Parallelizing Compilers , 2000 .

[13]  David A. Padua,et al.  Gated SSA-based demand-driven symbolic analysis for parallelizing compilers , 1995, ICS '95.

[14]  Kleanthis Psarris,et al.  The I Test: An Improved Dependence Test for Automatic Parallelization and Vectorization , 1991, IEEE Trans. Parallel Distributed Syst..

[15]  Yunheung Paek,et al.  Efficient and precise array access analysis , 2002, TOPL.

[16]  Alexandru Nicolau,et al.  Abstractions for Recursive Pointer Data Structures: Improving the Analysis of Imperative Programs. , 1992, PLDI 1992.

[17]  Laurie J. Hendren,et al.  Is it a tree, a DAG, or a cyclic graph? A shape analysis for heap-directed pointers in C , 1996, POPL '96.

[18]  David C. Sehr,et al.  On the importance of points-to analysis and other memory disambiguation methods for C programs , 2001, PLDI '01.

[19]  Ken Kennedy,et al.  Interactive Parallel Programming using the ParaScope Editor , 1991, IEEE Trans. Parallel Distributed Syst..

[20]  Harold Johnson,et al.  Data flow analysis for `intractable' system software , 1986, SIGPLAN '86.

[21]  Mateo Valero,et al.  Dynamic memory interval test vs. interprocedural pointer analysis in multimedia applications , 2005, TACO.

[22]  Alexandru Nicolau,et al.  Parallelizing Programs with Recursive Data Structures , 1989, IEEE Trans. Parallel Distributed Syst..

[23]  Monica S. Lam,et al.  Interprocedural parallelization analysis in SUIF , 2005, TOPL.

[24]  Alexandru Nicolau,et al.  Abstractions for recursive pointer data structures: improving the analysis and transformation of imperative programs , 1992, PLDI '92.

[25]  Reinhard Wilhelm,et al.  Solving shape-analysis problems in languages with destructive updating , 1998, TOPL.

[26]  Vivek Sarkar,et al.  Compile-time partitioning and scheduling of parallel programs , 1986, SIGPLAN '86.

[27]  Monica S. Lam,et al.  Maximizing Multiprocessor Performance with the SUIF Compiler , 1996, Digit. Tech. J..

[28]  Michael Wolfe,et al.  Multiple Version Loops , 1987, ICPP.