Detecting Coarse - Grain Parallelism Using an Interprocedural Parallelizing Compiler

This paper presents an extensive empirical evaluation of an interprocedural parallelizing compiler, developed as part of the Stanford SUIF compiler system. The system incorporates a comprehensive and integrated collection of analyses, including privatization and reduction recognition for both array and scalar variables, and symbolic analysis of array subscripts. The interprocedural analysis framework is designed to provide analysis results nearly as precise as full inlining but without its associated costs. Experimentation with this system shows that it is capable of detecting coarser granularity of parallelism than previously possible. Specifically, it can parallelize loops that span numerous procedures and hundreds of lines of codes, frequently requiring modifications to array data structures such as privatization and reduction transformations. Measurements from several standard benchmark suites demonstrate that an integrated combination of interprocedural analyses can substantially advance the capability of automatic parallelization technology.

[1]  Monica S. Lam,et al.  Interprocedural Analysis for Parallelization , 1995, LCPC.

[2]  Rudolf Eigenmann,et al.  Performance Analysis of Parallelizing Compilers on the Perfect Benchmarks Programs , 1992, IEEE Trans. Parallel Distributed Syst..

[3]  Ken Kennedy,et al.  Practical dependence testing , 1991, PLDI '91.

[4]  L HennessyJohn,et al.  Efficient and exact data dependence analysis , 1991 .

[5]  Constantine D. Polychronopoulos,et al.  Symbolic Analysis: A Basis for Parallelization, Optimization, and Scheduling of Programs , 1993, LCPC.

[6]  David A. Padua,et al.  Automatic Array Privatization , 1993, Compiler Optimizations for Scalable Parallel Systems Languages.

[7]  Paul Havlak,et al.  Interprocedural symbolic analysis , 1995 .

[8]  Rudolf Eigenmann,et al.  The range test: a dependence test for symbolic, non-linear expressions , 1994, Proceedings of Supercomputing '94.

[9]  Eugene W. Myers,et al.  A precise inter-procedural data flow algorithm , 1981, POPL '81.

[10]  Samuel P. Midkiff,et al.  An Empirical Study of Precise Interprocedural Array Analysis , 1994, Sci. Program..

[11]  Barbara G. Ryder,et al.  A safe approximate algorithm for interprocedural aliasing , 1992, PLDI '92.

[12]  Vadim Maslov,et al.  Delinearization: an efficient way to break multiloop dependence equations , 1992, PLDI '92.

[13]  Michael E. Wolf,et al.  Improving locality and parallelism in nested loops , 1992 .

[14]  Mary Hall,et al.  Interprocedural analysis for parallelization: design and experience , 1995 .

[15]  Pen-Chung Yew,et al.  Efficient interprocedural analysis for program parallelization and restructuring , 1988, PPoPP 1988.

[16]  Ken Kennedy,et al.  An Implementation of Interprocedural Bounded Regular Section Analysis , 1991, IEEE Trans. Parallel Distributed Syst..

[17]  Zhiyuan Li,et al.  Interprocedural Analysis for Parallel Programs , 1988, ICPP.

[18]  Monica S. Lam,et al.  Data and computation transformations for multiprocessors , 1995, PPOPP '95.

[19]  John M. Mellor-Crummey,et al.  FIAT: A Framework for Interprocedural Analysis and Transfomation , 1993, LCPC.

[20]  Lawrence Rauchwerger,et al.  Polaris: Improving the Effectiveness of Parallelizing Compilers , 1994, LCPC.

[21]  François Irigoin Interprocedural analyses for programming environments , 1993 .

[22]  Paul Feautrier,et al.  Direct parallelization of call statements , 1986, SIGPLAN '86.

[23]  Chau-Wen Tseng,et al.  Compiler optimizations for eliminating barrier synchronization , 1995, PPOPP '95.

[24]  François Irigoin,et al.  Interprocedural Array Region Analyses , 1995, Int. J. Parallel Program..

[25]  Ken Kennedy,et al.  A Methodology for Procedure Cloning , 1993, Computer languages.

[26]  Pierre Jouvelot,et al.  Semantical interprocedural parallelization: an overview of the PIPS project , 1991 .