Abstractions for Specifying Sparse Matrix Data Transformations

ions for Specifying Sparse Matrix Data Transformations 1 Payal Nandy Mary Hall Eddie C. Davis Catherine Olschanowsky Mahdi S Mohammadi, Wei He Michelle Strout University of Utah Boise State University University of Arizona Motivation • The polyhedral model is suitable for affine – loop bounds, array access expressions and transformations • Polyhedral model unsuitable for sparse matrix & unstructured mesh computations (non-affine) – Array accesses of the form A[B[i]] – Loop bounds of the form index[i] ≤ j < index[i+1] • Key Observation – Compiler generated code for run time inspector & executor – Run time inspection • can reveal mapping of iterations to array indices • Potentially change iteration or data space

[1]  Joel H. Saltz,et al.  Principles of runtime support for parallel processors , 1988, ICS '88.

[2]  Joel H. Saltz,et al.  The Preprocessed Doacross Loop , 1991, ICPP.

[3]  Joel H. Saltz,et al.  Runtime compilation techniques for data partitioning and communication schedule reuse , 1993, Supercomputing '93. Proceedings.

[4]  Geoffrey C. Fox,et al.  Supporting irregular distributions in FORTRAN 90D/HPF compilers , 1994 .

[5]  William Pugh,et al.  Nonlinear array dependence analysis , 1994 .

[6]  Joel H. Saltz,et al.  Programming Irregular Applications: Runtime Support, Compilation and Tools , 1997, Adv. Comput..

[7]  Ken Kennedy,et al.  Improving cache performance in dynamic applications through data and computation reorganization at run time , 1999, PLDI '99.

[8]  Larry Carter,et al.  Localizing non-affine array references , 1999, 1999 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00425).

[9]  L. Rauchwerger,et al.  The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization , 1999, IEEE Trans. Parallel Distributed Syst..

[10]  Ken Kennedy,et al.  Improving Memory Hierarchy Performance for Irregular Applications Using Data and Computation Reorderings , 2001, International Journal of Parallel Programming.

[11]  Chau-Wen Tseng,et al.  Exploiting locality for irregular scientific codes , 2006, IEEE Transactions on Parallel and Distributed Systems.

[12]  Rudolf Eigenmann,et al.  Optimizing irregular shared-memory applications for distributed-memory systems , 2006, PPoPP '06.

[13]  Bo Wu,et al.  Complexity analysis and algorithm design for reorganizing data to minimize non-coalesced memory accesses on GPU , 2013, PPoPP '13.

[14]  Mary W. Hall,et al.  Non-affine Extensions to Polyhedral Code Generation , 2014, CGO '14.

[15]  Mary W. Hall,et al.  Loop and data transformations for sparse matrix code , 2015, PLDI.

[16]  J. Ramanujam,et al.  Distributed memory code generation for mixed Irregular/Regular computations , 2015, PPoPP.

[17]  Keshav Pingali,et al.  Synchronization Trade-Offs in GPU Implementations of Graph Algorithms , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[18]  Larry Carter,et al.  An approach for code generation in the Sparse Polyhedral Framework , 2016, Parallel Comput..

[19]  Khalid Ahmad,et al.  Optimizing LOBPCG: Sparse Matrix Loop and Data Transformations in Action , 2016, LCPC.

[20]  Mary W. Hall,et al.  Compiler Transformation to Generate Hybrid Sparse Computations , 2016, 2016 6th Workshop on Irregular Applications: Architecture and Algorithms (IA3).

[21]  Hongbo Rong,et al.  Automating Wavefront Parallelization for Sparse Matrix Computations , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.

[22]  Manuel Selva,et al.  Full runtime polyhedral optimizing loop transformations with the generation, instantiation, and scheduling of code‐bones , 2017, Concurr. Comput. Pract. Exp..