Compiler and Run-Time Support for Exploiting Regularity within Irregular Applications

This paper starts from a well-known idea, that structure in irregular problems improves sequential performance, and tries to show that the same structure can also be exploited for parallelization of irregular problems on a distributed-memory multicomputer. In particular, we extend a well-known parallelization technique called run-time compilation to use structure information that is explicit on the array subscripts. This paper presents a number of internal representations suited to particular access patterns and shows how various preprocessing structures such as translation tables, trace arrays, and interprocessor communication schedules can be encoded in terms of one or more of these representations. We show how loop and index normalization are important for detection of irregularity in array references, as well as the presence of locality in such references. This paper presents methods for detection of irregularity, feasibility of inspection, and finally, placement of inspectors and interprocessor communication schedules. We show that this process can be automated through extensions to an HPF/Fortran-77 distributed-memory compiler (PARADIGM) and a new runtime support for irregular problems (PILAR) that uses a variety of internal representations of communication patterns. We devise performance measures which consider the relationship between the inspection cost, the execution cost, and the number of times the executor is invoked so that a comparison of the competing schemes can be performed independent of the number of iterations. Finally, we show experimental results on an IBM SP-2 that validate our approach. These results show that dramatic improvements in both memory requirements and execution time can be achieved by using these techniques.

[1]  Alok Choudhary,et al.  Runtime compilation techniques for data partitioning and communication schedule reuse , 1993, Supercomputing '93.

[2]  Youcef Saad,et al.  A Basic Tool Kit for Sparse Matrix Computations , 1990 .

[3]  Jack J. Dongarra,et al.  Solving linear systems on vector and shared memory computers , 1990 .

[4]  Harry Berryman,et al.  Run-Time Scheduling and Execution of Loops on Message Passing Machines , 1990, J. Parallel Distributed Comput..

[5]  Ken Kennedy,et al.  Improving memory hierarchy performance for irregular applications , 1999, ICS '99.

[6]  Utpal Banerjee,et al.  Loop Transformations for Restructuring Compilers: The Foundations , 1993, Springer US.

[7]  Guy L. Steele,et al.  The High Performance Fortran Handbook , 1993 .

[8]  Prithviraj Banerjee,et al.  Exploiting spatial regularity in irregular iterative applications , 1995, Proceedings of 9th International Parallel Processing Symposium.

[9]  Prithviraj Banerjee,et al.  Techniques to overlap computation and communication in irregular iterative applications , 1994, ICS '94.

[10]  P. Plassmann,et al.  The Efficient Parallel Iterative Solution of Large Sparse Linear Systems , 1993 .

[11]  A. Sussman,et al.  Compiler and runtime support for structured and block structured applications , 1993, Supercomputing '93.

[12]  Yang Zeng,et al.  Partitioning regular grid applications with irregular boundaries for cache-coherent multiprocessors , 1995, Proceedings of 9th International Parallel Processing Symposium.

[13]  Manish Gupta,et al.  Automatic Data Partitioning on Distributed Memory Multicomputers , 1992 .

[14]  Ken Kennedy,et al.  GIVE-N-TAKE—a balanced code placement framework , 1994, PLDI '94.

[15]  Prithviraj Banerjee,et al.  Compiler support for hybrid irregular accesses on multicomputers , 1996, ICS '96.

[16]  Horst D. Simon,et al.  Parallel computational fluid dynamics: implementations and results , 1992 .

[17]  Horst D. Simon Parallel Computational Fluid Dynamics , 1992 .

[18]  Ken Kennedy,et al.  Compiler Analysis for Irregular Problems in Fortran D , 1992, LCPC.

[19]  Antonio Lain Compiler and run-time support for irregular computations , 1996 .

[20]  Reinhard von Hanxleden,et al.  Compiler support for machine-independent parallelization of irregular problems , 1994, Rice COMP TR.

[21]  Chris R. Jesshope,et al.  Parallel Computers 2: Architecture, Programming and Algorithms , 1981 .

[22]  Frank Tip,et al.  A survey of program slicing techniques , 1994, J. Program. Lang..

[23]  Sanjay Ranka,et al.  Memory hierarchy management for iterative graph structures , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.

[24]  Sanjay Ranka,et al.  Architecture-independent locality-improving transformations of computational graphs embedded in k-dimensions , 1995, ICS '95.

[25]  Milind Girkar,et al.  Parafrase-2: an Environment for Parallelizing, Partitioning, Synchronizing, and Scheduling Programs on Multiprocessors , 1989, Int. J. High Speed Comput..

[26]  John A. Chandy,et al.  The Paradigm Compiler for Distributed-Memory Multicomputers , 1995, Computer.

[27]  Horst D. Simon,et al.  Fast multilevel implementation of recursive spectral bisection for partitioning unstructured problems , 1994, Concurr. Pract. Exp..

[28]  Roland Rühl A parallelizing compiler for distributed memory parallel processors , 1992 .

[29]  Karl J. Ottenstein,et al.  The program dependence graph in a software development environment , 1984 .

[30]  Etienne Morel,et al.  Global optimization by suppression of partial redundancies , 1979, CACM.

[31]  Scott B. Baden,et al.  A robust parallel programming model for dynamic non-uniform scientific computations , 1994, Proceedings of IEEE Scalable High Performance Computing Conference.

[32]  Joel H. Saltz,et al.  Slicing Analysis and Indirect Accesses to Distributed Arrays , 1993, LCPC.

[33]  P. Sadayappan,et al.  Compile-Time Charactirization Recurrent Patterns in Irregular Computations , 1993, 1993 International Conference on Parallel Processing - ICPP'93.

[34]  Andreas Müller,et al.  Extending high performance Fortran for the support of unstructured computations , 1995, ICS '95.

[35]  Ken Kennedy,et al.  Improving cache performance in dynamic applications through data and computation reorganization at run time , 1999, PLDI '99.