论文信息 - Loop and data transformations for sparse matrix code

Loop and data transformations for sparse matrix code

This paper introduces three new compiler transformations for representing and transforming sparse matrix computations and their data representations. In cooperation with run-time inspection, our compiler derives transformed matrix representations and associated transformed code to implement a variety of representations targeting different architecture platforms. This systematic approach to combining code and data transformations on sparse computations, which extends a polyhedral transformation and code generation framework, permits the compiler to compose these transformations with other transformations to generate code that is on average within 5% and often exceeds manually-tuned, high-performance sparse matrix libraries CUSP and OSKI. Additionally, the compiler-generated inspector codes are on average 1.5 faster than OSKI and perform comparably to CUSP, respectively.

Mary W. Hall | Anand Venkat | Michelle Mills Strout | M. Strout | Anand Venkat

[1] Padma Raghavan,et al. Exploiting dense substructures for fast sparse matrix vector multiplication , 2011, Int. J. High Perform. Comput. Appl..

[2] Albert Cohen,et al. Polyhedral Code Generation in the Real World , 2006, CC.

[3] Aart J. C. Bik,et al. Compiler support for sparse matrix computations , 1996 .

[4] Aart J. C. Bik,et al. Automatic Data Structure Selection and Transformation for Sparse Matrix Computations , 1996, IEEE Trans. Parallel Distributed Syst..

[5] Ken Kennedy,et al. Improving Memory Hierarchy Performance for Irregular Applications Using Data and Computation Reorderings , 2001, International Journal of Parallel Programming.

[6] Paul Feautrier,et al. Automatic Parallelization in the Polytope Model , 1996, The Data Parallel Programming Model.

[7] Joel H. Saltz,et al. Programming Irregular Applications: Runtime Support, Compilation and Tools , 1997, Adv. Comput..

[8] Richard Vuduc,et al. Automatic performance tuning of sparse matrix kernels , 2003 .

[9] Chun Chen,et al. Loop Transformation Recipes for Code Generation and Auto-Tuning , 2009, LCPC.

[10] Chau-Wen Tseng,et al. Exploiting locality for irregular scientific codes , 2006, IEEE Transactions on Parallel and Distributed Systems.

[11] William Pugh,et al. Optimization within a unified transformation framework , 1996 .

[12] Hyun Jin Moon,et al. Fast Sparse Matrix-Vector Multiplication by Exploiting Variable Block Structure , 2005, HPCC.

[13] P. Sadayappan,et al. Stencil-Aware GPU Optimization of Iterative Solvers , 2013, SIAM J. Sci. Comput..

[14] Keshav Pingali,et al. Next-generation generic programming and its application to sparse matrix computations , 2000, ICS '00.

[15] Chun Chen,et al. Polyhedra scanning revisited , 2012, PLDI.

[16] Mary W. Hall,et al. Non-affine Extensions to Polyhedral Code Generation , 2014, CGO '14.

[17] Lawrence Rauchwerger,et al. A Hybrid Approach to Proving Memory Reference Monotonicity , 2011, LCPC.

[18] Samuel Williams,et al. Optimization of sparse matrix-vector multiplication on emerging multicore platforms , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[19] Bo Wu,et al. Complexity analysis and algorithm design for reorganizing data to minimize non-coalesced memory accesses on GPU , 2013, PPoPP '13.

[20] Michael Garland,et al. Implementing sparse matrix-vector multiplication on throughput-oriented processors , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[21] Samuel Williams,et al. Optimizing Sparse Matrix-Multiple Vectors Multiplication for Nuclear Configuration Interaction Calculations , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[22] Lawrence Rauchwerger,et al. The LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization , 1995, PLDI '95.

[23] Sanjay V. Rajopadhye,et al. Generation of Efficient Nested Loops from Polyhedra , 2000, International Journal of Parallel Programming.

[24] Larry Carter,et al. Localizing non-affine array references , 1999, 1999 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00425).

[25] Xing Liu,et al. Efficient sparse matrix-vector multiplication on x86-based many-core processors , 2013, ICS '13.

[26] Ken Kennedy,et al. Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .

[27] Larry Carter,et al. An approach for code generation in the Sparse Polyhedral Framework , 2016, Parallel Comput..

[28] Joel H. Saltz,et al. Principles of runtime support for parallel processors , 1988, ICS '88.

[29] Ken Kennedy,et al. Improving cache performance in dynamic applications through data and computation reorganization at run time , 1999, PLDI '99.

[30] Carlos Guestrin,et al. Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .

[31] Harry A. G. Wijshoff,et al. Sublimation: Expanding Data Structures to Enable Data Instance Specific Optimizations , 2010, LCPC.

[32] Keshav Pingali,et al. Sparse code generation for imperfectly nested loops with dependences , 1997, ICS '97.

[33] John R. Gilbert,et al. The Combinatorial BLAS: design, implementation, and applications , 2011, Int. J. High Perform. Comput. Appl..

[34] Chun Chen,et al. A Programming Language Interface to Describe Transformations and Code Generation , 2010, LCPC.

[35] William Pugh,et al. SIPR: A New Framework for Generating Efficient Code for Sparse Matrix Computations , 1998, LCPC.

[36] J. Ramanujam,et al. Code generation for parallel execution of a class of irregular loops on distributed memory systems , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.