论文信息 - Abstractions for Specifying Sparse Matrix Data Transformations

Abstractions for Specifying Sparse Matrix Data Transformations

ions for Specifying Sparse Matrix Data Transformations 1 Payal Nandy Mary Hall Eddie C. Davis Catherine Olschanowsky Mahdi S Mohammadi, Wei He Michelle Strout University of Utah Boise State University University of Arizona Motivation • The polyhedral model is suitable for affine – loop bounds, array access expressions and transformations • Polyhedral model unsuitable for sparse matrix & unstructured mesh computations (non-affine) – Array accesses of the form A[B[i]] – Loop bounds of the form index[i] ≤ j < index[i+1] • Key Observation – Compiler generated code for run time inspector & executor – Run time inspection • can reveal mapping of iterations to array indices • Potentially change iteration or data space

[1] Joel H. Saltz,et al. Principles of runtime support for parallel processors , 1988, ICS '88.

[2] Joel H. Saltz,et al. The Preprocessed Doacross Loop , 1991, ICPP.

[3] Joel H. Saltz,et al. Runtime compilation techniques for data partitioning and communication schedule reuse , 1993, Supercomputing '93. Proceedings.

[4] Geoffrey C. Fox,et al. Supporting irregular distributions in FORTRAN 90D/HPF compilers , 1994 .

[5] William Pugh,et al. Nonlinear array dependence analysis , 1994 .

[6] Joel H. Saltz,et al. Programming Irregular Applications: Runtime Support, Compilation and Tools , 1997, Adv. Comput..

[7] Ken Kennedy,et al. Improving cache performance in dynamic applications through data and computation reorganization at run time , 1999, PLDI '99.

[8] Larry Carter,et al. Localizing non-affine array references , 1999, 1999 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00425).

[9] L. Rauchwerger,et al. The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization , 1999, IEEE Trans. Parallel Distributed Syst..

[10] Ken Kennedy,et al. Improving Memory Hierarchy Performance for Irregular Applications Using Data and Computation Reorderings , 2001, International Journal of Parallel Programming.

[11] Chau-Wen Tseng,et al. Exploiting locality for irregular scientific codes , 2006, IEEE Transactions on Parallel and Distributed Systems.

[12] Rudolf Eigenmann,et al. Optimizing irregular shared-memory applications for distributed-memory systems , 2006, PPoPP '06.

[13] Bo Wu,et al. Complexity analysis and algorithm design for reorganizing data to minimize non-coalesced memory accesses on GPU , 2013, PPoPP '13.

[14] Mary W. Hall,et al. Non-affine Extensions to Polyhedral Code Generation , 2014, CGO '14.

[15] Mary W. Hall,et al. Loop and data transformations for sparse matrix code , 2015, PLDI.

[16] J. Ramanujam,et al. Distributed memory code generation for mixed Irregular/Regular computations , 2015, PPoPP.

[17] Keshav Pingali,et al. Synchronization Trade-Offs in GPU Implementations of Graph Algorithms , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[18] Larry Carter,et al. An approach for code generation in the Sparse Polyhedral Framework , 2016, Parallel Comput..

[19] Khalid Ahmad,et al. Optimizing LOBPCG: Sparse Matrix Loop and Data Transformations in Action , 2016, LCPC.

[20] Mary W. Hall,et al. Compiler Transformation to Generate Hybrid Sparse Computations , 2016, 2016 6th Workshop on Irregular Applications: Architecture and Algorithms (IA3).

[21] Hongbo Rong,et al. Automating Wavefront Parallelization for Sparse Matrix Computations , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.

[22] Manuel Selva,et al. Full runtime polyhedral optimizing loop transformations with the generation, instantiation, and scheduling of code‐bones , 2017, Concurr. Comput. Pract. Exp..