Array Dataflow Analysis in Presence of Non-affine Constraints

Array dataflow dependence analysis is paramount for automatic parallelization. The description of dependences at the operation and array element level has been shown to improve significantly the output of many code optimizations. But this kind of analysis has two main issues: its high cost and its scope limited to a small number of programs. We first describe a new polynomial-time algorithm, outperforming other current methods in terms of both complexity and application domain. Then, in the continuity of the work done by J.-F. Collard, we present a general framework so as to handle any kind of dependences, by possibly producing approximate dependences. The model of programs is extended to any reducible control graph and any kind of references to array elements. An original method called iterative analysis, finds relations between non-affine constraints so as to improve the accuracy of the method. Besides, we provide a criterion ensuring that the approximation obtained is the best with respect to the information gathered on non-affine constraints by other analyses. Finally, several traditional applications of dataflow analyses are adapted to our method in order to take advantage of its results, and we detail more specifically an array expansion that is a trade-off between run-time overhead, memory requirement and degree of parallelism.

[1]  Arnauld Leservot Analyse interprocedurale du flot des donnees , 1996 .

[2]  Pierre Boulet,et al.  Bouclettes: A Fortran Loop Parallelizer , 1996, HPCN Europe.

[3]  Robert W. Floyd,et al.  Assigning Meanings to Programs , 1993 .

[4]  Peng Tu,et al.  Automatic array privatization and demand-driven symbolic analysis , 1996 .

[5]  Jack J. Dongarra,et al.  A comparative study of automatic vectorizing compilers , 1991, Parallel Comput..

[6]  Lawrence Rauchwerger,et al.  The privatizing DOALL test: a run-time technique for DOALL loop identification and array privatization , 1994, ICS '94.

[7]  L. Rauchwerger,et al.  The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization , 1999, IEEE Trans. Parallel Distributed Syst..

[8]  Dror Eliezer Maydan Accurate analysis of array references , 1993 .

[9]  TimePaul FeautrierLaboratoire Masi Some Eecient Solutions to the Aane Scheduling Problem Part I One-dimensional Time , 1993 .

[10]  Constantine D. Polychronopoulos,et al.  Symbolic analysis for parallelizing compilers , 1996, TOPL.

[11]  Utpal Banerjee,et al.  Dependence analysis for supercomputing , 1988, The Kluwer international series in engineering and computer science.

[12]  David K. Smith Theory of Linear and Integer Programming , 1987 .

[13]  Rudolf Eigenmann,et al.  Symbolic range propagation , 1995, Proceedings of 9th International Parallel Processing Symposium.

[14]  A. Dumay,et al.  Traitement des indexations non lineaires en parallelisation automatique : une methode de linearisation contextuelle , 1992 .

[15]  Michael Wolfe,et al.  Beyond induction variables , 1992, PLDI '92.

[16]  Francois Masdupuy,et al.  Array indices relational semantic analysis using rational cosets and trapezoids , 1993 .

[17]  Lothar Thiele,et al.  Computing Linear Data Dependencies in Nested Loop Programs , 1994, Parallel Process. Lett..

[18]  Zbigniew Chamski Environnement logiciel de programmation d'un accelerateur de calcul parallele , 1993 .

[19]  Martin Griebl,et al.  The loop parallelizer LooPo , 1996 .

[20]  FeautrierLaboratoire Masi Some Eecient Solutions to the Aane Scheduling Problem Part Ii Multidimensional Time , 1992 .

[21]  W. Pugh,et al.  Experiences with Constraint-based Array Dependence Analysis Experiences with Constraint-based Array Dependence Analysis , 1994 .

[22]  Paul Feautrier,et al.  Construction of Do Loops from Systems of Affine Constraints , 1995, Parallel Process. Lett..

[23]  Zhiyuan Li,et al.  Data dependence analysis on multi-dimensional array references , 1989, ICS '89.

[24]  Ken Kennedy,et al.  Incremental dependence analysis , 1990 .

[25]  Paul Feautrier,et al.  Applicaions of Fuzzy Array Dataflow Analysis , 1996, Euro-Par, Vol. I.

[26]  Martin Griebl,et al.  Generation of Synchronous Code for Automatic Parallelization of while Loops , 1995, Euro-Par.

[27]  Zhiyu Shen,et al.  An Empirical Study of Fortran Programs for Parallelizing Compilers , 1990, IEEE Trans. Parallel Distributed Syst..

[28]  Geoffrey C. Fox,et al.  The Perfect Club Benchmarks: Effective Performance Evaluation of Supercomputers , 1989, Int. J. High Perform. Comput. Appl..

[29]  Karen Lee Pieper Parallelizing compilers: implementation and effectiveness , 1993 .

[30]  William Pugh,et al.  Simplifying Polynominal Constraints Over Integers to Make Dependence Analysis More Precise , 1994, CONPAR.

[31]  P. Feautrier Parametric integer programming , 1988 .

[32]  William Pugh,et al.  The Omega test: A fast and practical integer programming algorithm for dependence analysis , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[33]  Thomas Brandes The importance of direct dependences for automatic parallelization , 1988, ICS '88.

[34]  Lawrence Rauchwerger,et al.  Polaris: The Next Generation in Parallelizing Compilers , 2000 .

[35]  David A. Padua,et al.  Gated SSA-based demand-driven symbolic analysis for parallelizing compilers , 1995, ICS '95.

[36]  Vadim Maslov,et al.  Delinearization: an efficient way to break multiloop dependence equations , 1992, PLDI '92.

[37]  Jean-Yves Berthou Construction d'un paralléliseur interactif de logiciels scientifiques de grande taille guide par des mesures de performances , 1993 .

[38]  Zhiyuan Li,et al.  Symbolic Array Dataflow Analysis for Array Privatization and Program Parallelization , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[39]  Rudolf Eigenmann,et al.  Performance Analysis of Parallelizing Compilers on the Perfect Benchmarks Programs , 1992, IEEE Trans. Parallel Distributed Syst..

[40]  Yves Robert,et al.  Mapping Uniform Loop Nests Onto Distributed Memory Architectures , 1993, Parallel Comput..

[41]  Leslie Lamport,et al.  The parallel execution of DO loops , 1974, CACM.

[42]  François Masdupuy,et al.  Array abstractions using semantic analysis of trapezoid congruences , 1992, ICS '92.

[43]  Paul Feautrier,et al.  Fuzzy Array Dataflow Analysis , 1997, J. Parallel Distributed Comput..

[44]  Constantine D. Polychronopoulos,et al.  Symbolic Analysis: A Basis for Parallelization, Optimization, and Scheduling of Programs , 1993, LCPC.

[45]  Sanjay V. Rajopadhye,et al.  Memory Reuse Analysis in the Polyhedral Model , 1996, Euro-Par, Vol. I.

[46]  Harold T. Hodes,et al.  The | lambda-Calculus. , 1988 .

[47]  Mark N. Wegman,et al.  An efficient method of computing static single assignment form , 1989, POPL '89.

[48]  Thomas R. Gross,et al.  Structured dataflow analysis for arrays and its use in an optimizing compiler , 1990, Softw. Pract. Exp..

[49]  Yves Robert,et al.  Affine-by-Statement Scheduling of Uniform and Affine Loop Nests over Parametric , 1995, J. Parallel Distributed Comput..

[50]  Paul Feautrier,et al.  Efficient Mapping of Interdependent Scans , 1996, Euro-Par, Vol. I.

[51]  Rudolf Eigenmann,et al.  An Overview of Symbolic Analysis Techniques Needed for the Effective Parallelization of the Perfect Benchmarks , 1994, 1994 Internatonal Conference on Parallel Processing Vol. 2.

[52]  William Pugh,et al.  An Exact Method for Analysis of Value-based Array Data Dependences , 1993, LCPC.

[53]  Mary Hall Managing interprocedural optimization , 1992 .

[54]  James R. Larus,et al.  Loop-Level Parallelism in Numeric and Symbolic Programs , 1993, IEEE Trans. Parallel Distributed Syst..

[55]  Alain Darte Techniques de parallélisation automatique de nids de boucles , 1993 .

[56]  Monica S. Lam,et al.  Array-data flow analysis and its use in array privatization , 1993, POPL '93.

[57]  Zbigniew Chamski,et al.  Nested loop sequences: towards efficient loop structures in automatic parallelization , 1994, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.

[58]  P. Feautrier Array expansion , 1988 .

[59]  Paul Feautrier Asymptotically Efficient Algorithms for Parallel Architectures , 1989 .

[60]  Pen-Chung Yew,et al.  Efficient interprocedural analysis for program parallelization and restructuring , 1988, PPoPP 1988.

[61]  William Pugh,et al.  Nonlinear array dependence analysis , 1994 .

[62]  Zahira Ammarguellat Restructuration des programmes fortran en vue de leur parallelisation , 1988 .

[63]  Nicolas Halbwachs,et al.  Automatic discovery of linear restraints among variables of a program , 1978, POPL.

[64]  Chau-Wen Tseng,et al.  An Overview of the SUIF Compiler for Scalable Parallel Machines , 1995, PPSC.

[65]  Paul Feautrier Basis of Parallel Speculative Execution , 1997, Euro-Par.

[66]  David A. Padua,et al.  Automatic Array Privatization , 1993, Compiler Optimizations for Scalable Parallel Systems Languages.

[67]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[68]  Frédéric Vivien,et al.  On the Optimality of Allen and Kennedy's Algorithm for Parallel Extraction in Nested Loops , 1996, Euro-Par, Vol. I.

[69]  Frédéric Vivien,et al.  Optimal Fine and Medium Grain Parallelism Detection in Polyhedral Reduced Dependence Graphs , 1996, Proceedings of the 1996 Conference on Parallel Architectures and Compilation Technique.

[70]  Ken Kennedy,et al.  Automatic translation of FORTRAN programs to vector form , 1987, TOPL.

[71]  Monica S. Lam,et al.  Detecting Coarse - Grain Parallelism Using an Interprocedural Parallelizing Compiler , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[72]  Yves Robert,et al.  (Pen)-ultimate tiling? , 1994, Integr..

[73]  William Blume Success And Limitations In Automatic Parallelization Of The Perfect Benchmarks Programs , 1992 .

[74]  Ken Kennedy,et al.  Analysis of interprocedural side effects in a parallel programming environment , 1988, J. Parallel Distributed Comput..

[75]  Xavier Redon Détection et exploitation des récurrences dans les programmes numériques en vue de leur parallélisation , 1995 .

[76]  Yves Robert,et al.  Linear Scheduling Is Nearly Optimal , 1991, Parallel Process. Lett..

[77]  Vadim Maslov,et al.  Lazy array data-flow dependence analysis , 1994, POPL '94.

[78]  David A. Padua,et al.  Static and Dynamic Evaluation of Data Dependence Analysis Techniques , 1996, IEEE Trans. Parallel Distributed Syst..

[79]  Barbara G. Ryder,et al.  Elimination algorithms for data flow analysis , 1986, CSUR.

[80]  François Irigoin,et al.  Exact versus Approximate Array Region Analyses , 1996, LCPC.

[81]  Anthony Skjellum,et al.  A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard , 1996, Parallel Comput..

[82]  Arthur J. Bernstein,et al.  Analysis of Programs for Parallel Processing , 1966, IEEE Trans. Electron. Comput..

[83]  Paul Feautrier,et al.  Direct parallelization of call statements , 1986, SIGPLAN '86.

[84]  Paul Feautrier,et al.  Automatic Storage Management for Parallel Programs , 1998, Parallel Comput..

[85]  Barbara M. Chapman,et al.  Supercompilers for parallel and vector computers , 1990, ACM Press frontier series.

[86]  William Pugh,et al.  Constraint-based array dependence analysis , 1998, TOPL.

[87]  J. A. Robinson,et al.  Theorem-Proving on the Computer , 1963, JACM.

[88]  William Pugh,et al.  Transitive Closure Of In nite Graphs And Its Applications , 1996 .

[89]  Rajiv Gupta,et al.  A practical data flow framework for array reference analysis and its use in optimizations , 1993, PLDI '93.

[90]  Keshav Pingali,et al.  Transformations for Imperfectly Nested Loops , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[91]  Paul Feautrier,et al.  Storage management in parallel programs , 1997, PDP.

[92]  Ken Kennedy,et al.  An Overview of the Fortran D Programming System , 1991, LCPC.

[93]  Arthur B. Maccabe,et al.  The program dependence web: a representation supporting control-, data-, and demand-driven interpretation of imperative languages , 1990, PLDI '90.

[94]  Donald W. Loveland,et al.  Automated theorem proving: a logical basis , 1978, Fundamental studies in computer science.

[95]  François Irigoin,et al.  Supernode partitioning , 1988, POPL '88.

[96]  Vaidy S. Sunderam,et al.  PVM: A Framework for Parallel Distributed Computing , 1990, Concurr. Pract. Exp..

[97]  William Pugh,et al.  Uniform techniques for loop optimization , 1991, ICS '91.

[98]  David A. Padua,et al.  Experience in the Automatic Parallelization of Four Perfect-Benchmark Programs , 1991, LCPC.

[99]  Monica S. Lam,et al.  A Loop Transformation Theory and an Algorithm to Maximize Parallelism , 1991, IEEE Trans. Parallel Distributed Syst..

[100]  Corinne Ancourt,et al.  Scanning polyhedra with DO loops , 1991, PPOPP '91.

[101]  William Pugh,et al.  Going Beyond Integer Programming with the Omega Test to Eliminate False Data Dependences , 1995, IEEE Trans. Parallel Distributed Syst..

[102]  Yves Robert,et al.  Plugging Anti and Output Dependence Removal Techniques Into Loop Parallelization Algorithm , 1997, Parallel Comput..

[103]  William Pugh,et al.  Static analysis of upper and lower bounds on dependences and parallelism , 1994, TOPL.

[104]  Milind Girkar,et al.  Parafrase-2: an Environment for Parallelizing, Partitioning, Synchronizing, and Scheduling Programs on Multiprocessors , 1989, Int. J. High Speed Comput..

[105]  Paul Feautrier,et al.  Scheduling reductions , 1994, ICS '94.