Combining compile-time and run-time parallelization[1]

This paper demonstrates that significant improvements to automatic parallelization technology require that existing systems be extended in two waysc (1) they must combine high-quality compile-time analysis with low-cost run-time testings and (2) they must take control flow into account during analysis. We support this claim with the results of an experiment that measures the safety of parallelization at run time for loops left unparallelized by the Stanford SUIF compiler’s automatic parallelization system. We present results of measurements on programs from two benchmark suites - \textsc{Specfp95} and \textsc{Nas} sample benchmarks - which identify inherently parallel loops in these programs that are missed by the compiler. We characterize remaining parallelization opportunities, and find that most of the loops require run-time testing, analysis of control flow, or some combination of the two. We present a new compile-time analysis technique that can be used to parallelize most of these remaining loops. This technique is designed to not only improve the results of compile-time parallelization, but also to produce low-cost, directed run-time tests that allow the system to defer binding of parallelization until run-time when safety cannot be proven statically. We call this approach predicated array data-flow analysis. We augment array data-flow analysis, which the compiler uses to identify independent and privatizable arrays, by associating predicates with array data-flow values. Predicated array data-flow analysis allows the compiler to derive “optimistic” data-flow values guarded by predicatess these predicates can be used to derive a run-time test guaranteeing the safety of parallelization. [1]This work has been supported by DARPA Contract DABT63-95-C-0118 and NSF Contract ACI-9721368.

[1]  Sungdo Moon,et al.  Predicated array data-flow analysis for run-time parallelization , 1998, ICS '98.

[2]  Patrick Cousot,et al.  Systematic design of program analysis frameworks , 1979, POPL.

[3]  David A. Padua,et al.  Automatic Array Privatization , 1993, Compiler Optimizations for Scalable Parallel Systems Languages.

[4]  Gyungho Lee,et al.  Symbolic Array Dataflow Analysis for Array Privatization and Program Parallelization , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[5]  Yunheung Paek,et al.  Parallel Programming with Polaris , 1996, Computer.

[6]  David A. Padua,et al.  On the Automatic Parallelization of the Perfect Benchmarks , 1998, IEEE Trans. Parallel Distributed Syst..

[7]  William Pugh,et al.  Eliminating false data dependences using the Omega test , 1992, PLDI '92.

[8]  Barry K. Rosen,et al.  Qualified Data Flow Problems , 1980, IEEE Transactions on Software Engineering.

[9]  Monica S. Lam,et al.  Maximizing Multiprocessor Performance with the SUIF Compiler , 1996, Digit. Tech. J..

[10]  Mary W. Hall,et al.  Detecting Coarse - Grain Parallelism Using an Interprocedural Parallelizing Compiler , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[11]  Lawrence Rauchwerger,et al.  The LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization , 1995, PLDI '95.

[12]  Ken Kennedy Practical techniques to augment dependence analysis in the presence of symbolic terms , 1997 .

[13]  James R. Larus,et al.  Improving data-flow analysis with path profiles , 1998, PLDI.

[14]  Rudolf Eigenmann,et al.  Performance Analysis of Parallelizing Compilers on the Perfect Benchmarks Programs , 1992, IEEE Trans. Parallel Distributed Syst..

[15]  Rudolf Eigenmann,et al.  Symbolic analysis techniques for effective automatic parallelization , 1995 .

[16]  Monica S. Lam,et al.  Interprocedural Analysis for Parallelization , 1995, LCPC.

[17]  Joel H. Saltz,et al.  Run-Time Parallelization and Scheduling of Loops , 1991, IEEE Trans. Computers.

[18]  Daniel M. Yellin,et al.  Extending Typestate Checking Using Conditional Liveness Analysis , 1993, IEEE Trans. Software Eng..

[19]  Rajiv Gupta,et al.  Interprocedural conditional branch elimination , 1997, PLDI '97.

[20]  Sungdo Moon,et al.  Evaluation of predicated array data-flow analysis for automatic parallelization , 1999, PPoPP '99.

[21]  Zhiyuan Li,et al.  Experience with efficient array data flow analysis for array privatization , 1997, PPOPP '97.

[22]  Peng Tu,et al.  Automatic array privatization and demand-driven symbolic analysis , 1996 .

[23]  Sungdo Moon,et al.  Measuring the effectiveness of automatic parallelization in SUIF , 1998, ICS '98.

[24]  Joel H. Saltz,et al.  Deferred Data-Flow Analysis : Algorithms, Proofs and Applications , 1998 .

[25]  Saman Amarasinghe,et al.  Parallelizing Compiler Techniques Based on Linear Inequalities , 1997 .