Compile-Time Charactirization Recurrent Patterns in Irregular Computations

Many engineering applications use irregular array access functions. Compile-time characterization of the computation structure of such applications is infeasible, making the automatic generation of efficient communication dificuit. This paper considers the Inspector-Executor (IEJ compilalion model to handle such applications. In this model, a run-time inspection of the computation is followed by the aciual execution. This paper discusses the assue of automatic compile-time identification of sections of a program that have recurrent computational patterns so that the cost of run-time inspection is amortizable over many executions. An algorithm is presented and an illusiraitve example is given.

[1]  Sandeep Kumar S. Gupta Synthesizing communication-efficient distributed-memory parallel programs for block recursive algorithms , 1995 .

[2]  R. W. Johnson,et al.  A tensor product formulation of Strassen's matrix multiplication algorithm with memory reduction , 1993, [1993] Proceedings Seventh International Parallel Processing Symposium.

[3]  Rodney W. Johnson,et al.  Generating Parallel Programs from Tensor Product Formulas: A Case Study of Strassen's Matrix Multiplication Algorithm , 1992, ICPP.

[4]  David C. Sehr,et al.  Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing , 1997 .

[5]  J. Ramanujam,et al.  Multi-phase array redistribution: modeling and evaluation , 1995, Proceedings of 9th International Parallel Processing Symposium.

[6]  Chua-Huang Huang,et al.  On the Synthesis of Programs for Various Parallel Architectures , 1991, ICPP.

[7]  Harry Berryman,et al.  Execution time support for adaptive scientific algorithms on distributed memory machines , 1991, Concurr. Pract. Exp..

[8]  P. Sadayappan,et al.  A Methodology for Generating Efficient Disk-Based Algorithms from Tensor Product Formulas , 1993, LCPC.

[9]  Sandeep K. S. Gupta,et al.  On the Synthesis of Parallel Programs from Tensor Product Formulas for Block Recursive Algorithms , 1992, LCPC.

[10]  P. Sadayappan,et al.  Efficient transposition algorithms for large matrices , 1993, Supercomputing '93.

[11]  Manish Gupta,et al.  A methodology for high-level synthesis of communication on multicomputers , 1992, ICS '92.

[12]  David A. Padua,et al.  Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing , 1991 .

[13]  Bharat Kumar,et al.  An Algebraic Approach to Cache Memory Characterization for Block Recursive Algorithms , 1994 .

[14]  Sandeep K. S. Gupta,et al.  Implementing Fast Fourier Transforms on Distributed-Memory Multiprocessors Using Data Redistributions , 1994, Parallel Process. Lett..

[15]  Harry Berryman,et al.  Distributed Memory Compiler Design for Sparse Problems , 1995, IEEE Trans. Computers.

[16]  Bharat Kumar Ordering and mapping for parallel sparse factorization , 1995 .

[17]  Dimitri J. Mavriplis,et al.  The design and implementation of a parallel unstructured Euler solver using software primitives , 1992 .

[18]  Rodney W. Johnson,et al.  Multilinear algebra and parallel programming , 1990, Supercomputing '90.

[19]  Sandeep K. S. Gupta,et al.  On Compiling Array Expressions for Efficient Execution on Distributed-Memory Machines , 1993, 1993 International Conference on Parallel Processing - ICPP'93.

[20]  P. Sadayappan,et al.  A tensor product formulation of Strassen's matrix multiplication algorithm with memory reduction , 1993, [1993] Proceedings Seventh International Parallel Processing Symposium.

[21]  Kalluri Eswar Communication-Efficient Parallel Sparse Cholesky Factorization / , 1995 .

[22]  P. Sadayappan,et al.  Incremental Generation of Index Sets for Array Statement Execution on Distributed-Memory Machines , 1994, LCPC.

[23]  P. Sadayappan,et al.  Memory-adaptive parallel sparse Cholesky factorization , 1994, Proceedings of IEEE Scalable High Performance Computing Conference.

[24]  Christian Lengauer,et al.  The Static Derivation of Concurrency and its Mechanzed Certification , 1984, Seminar on Concurrency.

[25]  Chua-Huang Huang,et al.  A report on the performance of an implementation of Strassen's algorithm , 1991 .

[26]  Christian Lengauer,et al.  The automated proof of a trace transformation for a bitonic sort , 1986, Theor. Comput. Sci..

[27]  Sandeep K. S. Gupta,et al.  A methodology for generating data distributions to optimize communication , 1992, [1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing.

[28]  Christian Lengauer,et al.  A mechanically certified theorem about optimal concurrency of sorting networks , 1986, POPL '86.

[29]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[30]  P. Sadayappan,et al.  An algebraic theory for modeling direct interconnection networks , 1992, Proceedings Supercomputing '92.

[31]  Sandeep K. S. Gupta,et al.  On the automatic generation of data distributions , 1993, SIGP.

[32]  Sandeep K. S. Gupta,et al.  EXTENT: a portable programming environment for designing and implementing high-performance block recursive algorithms , 1994, Proceedings of Supercomputing '94.

[33]  P. Sadayappan,et al.  Compiling Array Statements for Efficient Execution on Distributed-Memory Machines: Two-Level Mappings , 1995, LCPC.

[34]  P. Sadayappan,et al.  On mapping data and computation for parallel sparse Cholesky factorization , 1995, Proceedings Frontiers '95. The Fifth Symposium on the Frontiers of Massively Parallel Computation.

[35]  Bharat Kumar,et al.  A Clustering Algorithm for Parallel Sparse Cholesky Factorization , 1995, Parallel Process. Lett..

[36]  Ken Kennedy,et al.  Computer support for machine-independent parallel programming in Fortran D , 1992 .

[37]  Sanjay Sharma,et al.  An Algebraic Theory for Modeling Multistage Interconnection Networks , 1993, J. Inf. Sci. Eng..

[38]  Bharat Kumar,et al.  On sparse matrix reordering for parallel factorization , 1994, ICS '94.

[39]  Sandeep K. S. Gupta,et al.  Communication-efficient implementation of block recursive algorithms on distributed-memory machines , 1994, Proceedings of 1994 International Conference on Parallel and Distributed Systems.

[40]  P. Sadayappan,et al.  Circuit Simulation on Shared-Memory Multiprocessors , 1988, IEEE Trans. Computers.

[41]  P. Sadayappan,et al.  Supernodal Sparse Cholesky Factorization on Distributed-Memory Multiprocessors , 1993, 1993 International Conference on Parallel Processing - ICPP'93.

[42]  P. Sadayappan,et al.  Communication-Free Hyperplane Partitioning of Nested Loops , 1993, J. Parallel Distributed Comput..

[43]  Ken Kennedy,et al.  Fortran D Language Specification , 1990 .

[44]  Shivnandan Durgesh Kaushik Compile-time and run-time strategies for array statement execution on distributed-memory machines , 1995 .

[45]  P. Sadayappan,et al.  An approach to communication-efficient data redistribution , 1994, ICS '94.