论文信息 - Compile-Time Charactirization Recurrent Patterns in Irregular Computations

Compile-Time Charactirization Recurrent Patterns in Irregular Computations

Many engineering applications use irregular array access functions. Compile-time characterization of the computation structure of such applications is infeasible, making the automatic generation of efficient communication dificuit. This paper considers the Inspector-Executor (IEJ compilalion model to handle such applications. In this model, a run-time inspection of the computation is followed by the aciual execution. This paper discusses the assue of automatic compile-time identification of sections of a program that have recurrent computational patterns so that the cost of run-time inspection is amortizable over many executions. An algorithm is presented and an illusiraitve example is given.

[1] Sandeep Kumar S. Gupta. Synthesizing communication-efficient distributed-memory parallel programs for block recursive algorithms , 1995 .

[2] R. W. Johnson,et al. A tensor product formulation of Strassen's matrix multiplication algorithm with memory reduction , 1993, [1993] Proceedings Seventh International Parallel Processing Symposium.

[3] Rodney W. Johnson,et al. Generating Parallel Programs from Tensor Product Formulas: A Case Study of Strassen's Matrix Multiplication Algorithm , 1992, ICPP.

[4] David C. Sehr,et al. Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing , 1997 .

[5] J. Ramanujam,et al. Multi-phase array redistribution: modeling and evaluation , 1995, Proceedings of 9th International Parallel Processing Symposium.

[6] Chua-Huang Huang,et al. On the Synthesis of Programs for Various Parallel Architectures , 1991, ICPP.

[7] Harry Berryman,et al. Execution time support for adaptive scientific algorithms on distributed memory machines , 1991, Concurr. Pract. Exp..

[8] P. Sadayappan,et al. A Methodology for Generating Efficient Disk-Based Algorithms from Tensor Product Formulas , 1993, LCPC.

[9] Sandeep K. S. Gupta,et al. On the Synthesis of Parallel Programs from Tensor Product Formulas for Block Recursive Algorithms , 1992, LCPC.

[10] P. Sadayappan,et al. Efficient transposition algorithms for large matrices , 1993, Supercomputing '93.

[11] Manish Gupta,et al. A methodology for high-level synthesis of communication on multicomputers , 1992, ICS '92.

[12] David A. Padua,et al. Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing , 1991 .

[13] Bharat Kumar,et al. An Algebraic Approach to Cache Memory Characterization for Block Recursive Algorithms , 1994 .

[14] Sandeep K. S. Gupta,et al. Implementing Fast Fourier Transforms on Distributed-Memory Multiprocessors Using Data Redistributions , 1994, Parallel Process. Lett..

[15] Harry Berryman,et al. Distributed Memory Compiler Design for Sparse Problems , 1995, IEEE Trans. Computers.

[16] Bharat Kumar. Ordering and mapping for parallel sparse factorization , 1995 .

[17] Dimitri J. Mavriplis,et al. The design and implementation of a parallel unstructured Euler solver using software primitives , 1992 .

[18] Rodney W. Johnson,et al. Multilinear algebra and parallel programming , 1990, Supercomputing '90.

[19] Sandeep K. S. Gupta,et al. On Compiling Array Expressions for Efficient Execution on Distributed-Memory Machines , 1993, 1993 International Conference on Parallel Processing - ICPP'93.

[20] P. Sadayappan,et al. A tensor product formulation of Strassen's matrix multiplication algorithm with memory reduction , 1993, [1993] Proceedings Seventh International Parallel Processing Symposium.

[21] Kalluri Eswar. Communication-Efficient Parallel Sparse Cholesky Factorization / , 1995 .

[22] P. Sadayappan,et al. Incremental Generation of Index Sets for Array Statement Execution on Distributed-Memory Machines , 1994, LCPC.

[23] P. Sadayappan,et al. Memory-adaptive parallel sparse Cholesky factorization , 1994, Proceedings of IEEE Scalable High Performance Computing Conference.

[24] Christian Lengauer,et al. The Static Derivation of Concurrency and its Mechanzed Certification , 1984, Seminar on Concurrency.

[25] Chua-Huang Huang,et al. A report on the performance of an implementation of Strassen's algorithm , 1991 .

[26] Christian Lengauer,et al. The automated proof of a trace transformation for a bitonic sort , 1986, Theor. Comput. Sci..

[27] Sandeep K. S. Gupta,et al. A methodology for generating data distributions to optimize communication , 1992, [1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing.

[28] Christian Lengauer,et al. A mechanically certified theorem about optimal concurrency of sorting networks , 1986, POPL '86.

[29] Alfred V. Aho,et al. Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[30] P. Sadayappan,et al. An algebraic theory for modeling direct interconnection networks , 1992, Proceedings Supercomputing '92.

[31] Sandeep K. S. Gupta,et al. On the automatic generation of data distributions , 1993, SIGP.

[32] Sandeep K. S. Gupta,et al. EXTENT: a portable programming environment for designing and implementing high-performance block recursive algorithms , 1994, Proceedings of Supercomputing '94.

[33] P. Sadayappan,et al. Compiling Array Statements for Efficient Execution on Distributed-Memory Machines: Two-Level Mappings , 1995, LCPC.

[34] P. Sadayappan,et al. On mapping data and computation for parallel sparse Cholesky factorization , 1995, Proceedings Frontiers '95. The Fifth Symposium on the Frontiers of Massively Parallel Computation.

[35] Bharat Kumar,et al. A Clustering Algorithm for Parallel Sparse Cholesky Factorization , 1995, Parallel Process. Lett..

[36] Ken Kennedy,et al. Computer support for machine-independent parallel programming in Fortran D , 1992 .

[37] Sanjay Sharma,et al. An Algebraic Theory for Modeling Multistage Interconnection Networks , 1993, J. Inf. Sci. Eng..

[38] Bharat Kumar,et al. On sparse matrix reordering for parallel factorization , 1994, ICS '94.

[39] Sandeep K. S. Gupta,et al. Communication-efficient implementation of block recursive algorithms on distributed-memory machines , 1994, Proceedings of 1994 International Conference on Parallel and Distributed Systems.

[40] P. Sadayappan,et al. Circuit Simulation on Shared-Memory Multiprocessors , 1988, IEEE Trans. Computers.

[41] P. Sadayappan,et al. Supernodal Sparse Cholesky Factorization on Distributed-Memory Multiprocessors , 1993, 1993 International Conference on Parallel Processing - ICPP'93.

[42] P. Sadayappan,et al. Communication-Free Hyperplane Partitioning of Nested Loops , 1993, J. Parallel Distributed Comput..

[43] Ken Kennedy,et al. Fortran D Language Specification , 1990 .

[44] Shivnandan Durgesh Kaushik. Compile-time and run-time strategies for array statement execution on distributed-memory machines , 1995 .

[45] P. Sadayappan,et al. An approach to communication-efficient data redistribution , 1994, ICS '94.