Sensitivity analysis for automatic parallelization on multi-cores

Sensitivity Analysis (SA) is a novel compiler technique that complements, and integrates with, static automatic parallelization analysis for the cases when relevant program behavior is input sensitive. In this paper we show how SA can extract all the input dependent, statically unavailable, conditions for which loops can be dynamically parallelized. SA generates a sequence of sufficient conditions which, when evaluated dynamically in order of their complexity, can each validate the dynamic parallel execution of the corresponding loop. For example, SA can first attempt to validate parallelization by checking simple conditions related to loop bounds. If such simple conditions cannot be met, then validating dynamic parallelization may require evaluating conditions related to the entire memory reference trace of a loop, thus decreasing the benefits of parallel execution. We have implemented Sensitivity Analysis in the Polaris compiler and evaluated its performance using 22 industry standard benchmark codes running on two multicore systems. In most cases we have obtained speedups superior to the Intel Ifort compiler because with SA we could complement static analysis with minimum cost dynamic analysis and extract most of the available coarse grained parallelism.

[1]  William Pugh,et al.  The Omega test: A fast and practical integer programming algorithm for dependence analysis , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[2]  Sungdo Moon,et al.  A Case for Combining Compile-Time and Run-Time Parallelization , 1998, LCR.

[3]  John Zahorjan,et al.  Improving the performance of runtime parallelization , 1993, PPOPP '93.

[4]  B. Eatrice Creusillet,et al.  Exact vs. Approximate Array Region Analyses , 1996 .

[5]  L. Rauchwerger,et al.  The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization , 1999, IEEE Trans. Parallel Distributed Syst..

[6]  Constantine D. Polychronopoulos,et al.  Symbolic analysis for parallelizing compilers , 1996, TOPL.

[7]  D. Zhang,et al.  The value evolution graph and its use in memory reference analysis , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[8]  Monica S. Lam,et al.  Efficient and exact data dependence analysis , 1991, PLDI '91.

[9]  Kyle A. Gallivan,et al.  A unified framework for nonlinear dependence testing and symbolic analysis , 2004, ICS '04.

[10]  Paul Feautrier,et al.  Dataflow analysis of array and scalar references , 1991, International Journal of Parallel Programming.

[11]  Joel H. Saltz,et al.  Interprocedural partial redundancy elimination and its application to distributed memory compilation , 1995, PLDI '95.

[12]  William Pugh,et al.  Nonlinear array dependence analysis , 1994 .

[13]  Lawrence Rauchwerger,et al.  Hybrid Analysis: Static & Dynamic Memory Reference Analysis , 2004, International Journal of Parallel Programming.

[14]  Joel H. Saltz,et al.  Run-time parallelization and scheduling of loops , 1989, SPAA '89.

[15]  Yunheung Paek,et al.  Efficient and precise array access analysis , 2002, TOPL.

[16]  Utpal Banerjee,et al.  Dependence analysis for supercomputing , 1988, The Kluwer international series in engineering and computer science.

[17]  François Irigoin,et al.  Exact versus Approximate Array Region Analyses , 1996, LCPC.

[18]  Kleanthis Psarris,et al.  The I Test: An Improved Dependence Test for Automatic Parallelization and Vectorization , 1991, IEEE Trans. Parallel Distributed Syst..

[19]  Ken Kennedy,et al.  Practical dependence testing , 1991, PLDI '91.

[20]  Michael Wolfe,et al.  High performance compilers for parallel computing , 1995 .

[21]  David A. Padua,et al.  Analysis of Irregular Single-Indexed Array Accesses and Its Applications in Compiler Optimizations , 2000, CC.

[22]  Monica S. Lam,et al.  Interprocedural parallelization analysis in SUIF , 2005, TOPL.

[23]  David A. Padua,et al.  Induction Variable Analysis without Idiom Recognition: Beyond Monotonicity , 2001, LCPC.

[24]  Rudolf Eigenmann,et al.  The range test: a dependence test for symbolic, non-linear expressions , 1994, Proceedings of Supercomputing '94.

[25]  Dean M. Tullsen,et al.  Mitosis compiler: an infrastructure for speculative threading based on pre-computation slices , 2005, PLDI '05.

[26]  Yunheung Paek,et al.  Parallel Programming with Polaris , 1996, Computer.

[27]  Chau-Wen Tseng,et al.  The Power Test for Data Dependence , 1992, IEEE Trans. Parallel Distributed Syst..

[28]  William Pugh,et al.  An Exact Method for Analysis of Value-based Array Data Dependences , 1993, LCPC.

[29]  Jay Hoeflinger,et al.  Interprocedural parallelization using memory classification analysis , 1998 .

[30]  Sungdo Moon,et al.  Evaluation of predicated array data-flow analysis for automatic parallelization , 1999, PPoPP '99.

[31]  Josep Torrellas,et al.  An efficient algorithm for the run-time parallelization of DOACROSS loops , 1994, Proceedings of Supercomputing '94.

[32]  Manish Gupta,et al.  Automatic Parallelization of Recursive Procedures , 1999, 1999 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00425).

[33]  P. Feautrier Parametric integer programming , 1988 .

[34]  Yunheung Paek,et al.  Advanced Program Restructuring for High-Performance Computers with Polaris , 2000 .

[35]  Sungdo Moon,et al.  Predicated array data-flow analysis for run-time parallelization , 1998, ICS '98.