Hybrid Dependence Analysis for Automatic Parallelization

Automatic program parallelization has been an elusive goal for many years. It has recently become more important due to the widespread introduction of multi-cores in PCs. Automatic parallelization could not be achieved because classic compiler analysis was neither powerful enough and program behavior was found to be in many cases input dependent. Run-time thread level parallelization, introduced in 1995, was a welcome but somewhat different avenue for advancing parallelization coverage. In this paper we introduce a novel analysis, Hybrid Analysis (HA), which unifies static and dynamic memory reference techniques into a seamless compiler framework which extracts almost maximum available parallelism from scientific codes and generates minimum run-time overhead. In this paper we will present how we can extract maximum information from the quantities that could not be sufficiently analyzed through static compiler methods and generate sufficient conditions which, when evaluated dynamically can validate optimizations. A large number of experiments confirm the viability of our techniques, which have been implemented in the Polaris compiler.

[1]  Monica S. Lam,et al.  Array-data flow analysis and its use in array privatization , 1993, POPL '93.

[2]  Monica S. Lam,et al.  Efficient and exact data dependence analysis , 1991, PLDI '91.

[3]  Paul Feautrier,et al.  Dataflow analysis of array and scalar references , 1991, International Journal of Parallel Programming.

[4]  David A. Padua,et al.  Induction Variable Analysis without Idiom Recognition: Beyond Monotonicity , 2001, LCPC.

[5]  Paul Feautrier,et al.  Direct parallelization of call statements , 1986, SIGPLAN '86.

[6]  François Irigoin,et al.  Exact versus Approximate Array Region Analyses , 1996, LCPC.

[7]  Chau-Wen Tseng,et al.  The Power Test for Data Dependence , 1992, IEEE Trans. Parallel Distributed Syst..

[8]  Sungdo Moon,et al.  Evaluation of predicated array data-flow analysis for automatic parallelization , 1999, PPoPP '99.

[9]  Michael Wolfe,et al.  High performance compilers for parallel computing , 1995 .

[10]  Vivek Sarkar,et al.  Array SSA form and its use in parallelization , 1998, POPL '98.

[11]  David A. Padua,et al.  Analysis of Irregular Single-Indexed Array Accesses and Its Applications in Compiler Optimizations , 2000, CC.

[12]  Josep Torrellas,et al.  An efficient algorithm for the run-time parallelization of DOACROSS loops , 1994, Proceedings of Supercomputing '94.

[13]  Kleanthis Psarris,et al.  The I Test: An Improved Dependence Test for Automatic Parallelization and Vectorization , 1991, IEEE Trans. Parallel Distributed Syst..

[14]  Yunheung Paek,et al.  Efficient and precise array access analysis , 2002, TOPL.

[15]  P. Feautrier Parametric integer programming , 1988 .

[16]  Yunheung Paek,et al.  Advanced Program Restructuring for High-Performance Computers with Polaris , 2000 .

[17]  Rajiv Gupta,et al.  A practical data flow framework for array reference analysis and its use in optimizations , 1993, PLDI '93.

[18]  Sungdo Moon,et al.  Predicated array data-flow analysis for run-time parallelization , 1998, ICS '98.

[19]  Ken Kennedy,et al.  Practical dependence testing , 1991, PLDI '91.

[20]  Manish Gupta,et al.  Automatic Parallelization of Recursive Procedures , 1999, 1999 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00425).

[21]  Kyle A. Gallivan,et al.  A unified framework for nonlinear dependence testing and symbolic analysis , 2004, ICS '04.

[22]  Monica S. Lam,et al.  Maximizing Multiprocessor Performance with the SUIF Compiler , 1996, Digit. Tech. J..

[23]  William Pugh,et al.  An Exact Method for Analysis of Value-based Array Data Dependences , 1993, LCPC.

[24]  Lawrence Rauchwerger,et al.  Hybrid Analysis: Static & Dynamic Memory Reference Analysis , 2004, International Journal of Parallel Programming.

[25]  Joel H. Saltz,et al.  Run-time parallelization and scheduling of loops , 1989, SPAA '89.

[26]  Utpal Banerjee,et al.  Dependence analysis for supercomputing , 1988, The Kluwer international series in engineering and computer science.

[27]  William Pugh,et al.  The Omega test: A fast and practical integer programming algorithm for dependence analysis , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[28]  John Zahorjan,et al.  Improving the performance of runtime parallelization , 1993, PPOPP '93.

[29]  Zhiyuan Li,et al.  Symbolic Array Dataflow Analysis for Array Privatization and Program Parallelization , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[30]  B. Eatrice Creusillet,et al.  Exact vs. Approximate Array Region Analyses , 1996 .

[31]  Rudolf Eigenmann,et al.  The range test: a dependence test for symbolic, non-linear expressions , 1994, Proceedings of Supercomputing '94.

[32]  Dean M. Tullsen,et al.  Mitosis compiler: an infrastructure for speculative threading based on pre-computation slices , 2005, PLDI '05.

[33]  Jay Hoeflinger,et al.  Interprocedural parallelization using memory classification analysis , 1998 .

[34]  Michael G. Burke An interval-based approach to exhaustive and incremental interprocedural data-flow analysis , 1990, TOPL.

[35]  L. Rauchwerger,et al.  The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization , 1999, IEEE Trans. Parallel Distributed Syst..

[36]  Constantine D. Polychronopoulos,et al.  Symbolic analysis for parallelizing compilers , 1996, TOPL.

[37]  Joel H. Saltz,et al.  Interprocedural partial redundancy elimination and its application to distributed memory compilation , 1995, PLDI '95.

[38]  Sungdo Moon,et al.  A Case for Combining Compile-Time and Run-Time Parallelization , 1998, LCR.

[39]  D. Zhang,et al.  The value evolution graph and its use in memory reference analysis , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..