Techniques for Reducing the Overhead of Run-Time Parallelization

Current parallelizing compilers cannot identify a significant fraction of parallelizable loops because they have complex or statically insufficiently defined access patterns. As parallelizable loops arise frequently in practice, we have introduced a novel framework for their identification: speculative parallelization. While we have previously shown that this method is inherently scalable its practical success depends on the fraction of ideal speedup that can be obtained on modest to moderately large parallel machines. Maximum parallelism can be obtained only through a minimization of the run-time overhead of the method, which in turn depends on its level of integration within a classic restructuring compiler and on its adaptation to characteristics of the parallelized application. We present several compiler and run-time techniques designed specifically for optimizing the run-time parallelization of sparse applications. We show how we minimize the run-time overhead associated with the speculative parallelization of sparse applications by using static control flow information to reduce the number of memory references that have to be collected at run-time. We then present heuristics to speculate on the type and data structures used by the program and thus reduce the memory requirements needed for tracing the sparse access patterns. We present an implementation in the Polaris infrastructure and experimental results.

[1]  Jay Hoeflinger,et al.  Interprocedural parallelization using memory classification analysis , 1998 .

[2]  Joel H. Saltz,et al.  Run-time parallelization and scheduling of loops , 1989, SPAA '89.

[3]  Lawrence Rauchwerger,et al.  Implementation Issues of Loop-Level Speculative Run-Time Parallelization , 1999, CC.

[4]  David A. Padua,et al.  Advanced compiler optimizations for supercomputers , 1986, CACM.

[5]  John Zahorjan,et al.  Improving the performance of runtime parallelization , 1993, PPOPP '93.

[6]  Yunheung Paek,et al.  Parallel Programming with Polaris , 1996, Computer.

[7]  L. Rauchwerger,et al.  The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization , 1999, IEEE Trans. Parallel Distributed Syst..

[8]  Yunheung Paek,et al.  Simplification of array access patterns for compiler optimizations , 1998, PLDI.

[9]  Lawrence Rauchwerger,et al.  Adaptive reduction parallelization techniques , 2000, ICS '00.

[10]  Monica S. Lam,et al.  Data Dependence and Data-Flow Analysis of Arrays , 1992, LCPC.

[11]  Whirley DYNA3D: A nonlinear, explicit, three-dimensional finite element code for solid and structural mechanics , 1993 .

[12]  M. Karplus,et al.  CHARMM: A program for macromolecular energy, minimization, and dynamics calculations , 1983 .

[13]  Yunheung Paek,et al.  Advanced Program Restructuring for High-Performance Computers with Polaris , 2000 .

[14]  Pen-Chung Yew,et al.  A Scheme to Enforce Data Dependence on Large Multiprocessor Systems , 1987, IEEE Trans. Software Eng..

[15]  David A. Padua,et al.  Automatic Array Privatization , 1993, Compiler Optimizations for Scalable Parallel Systems Languages.

[16]  Lawrence Rauchwerger,et al.  Parallelizing while loops for multiprocessor systems , 1995, Proceedings of 9th International Parallel Processing Symposium.

[17]  Zhiyuan Li Array privatization for parallel execution of loops , 1992, ICS.