Loop and data transformations for sparse matrix code

This paper introduces three new compiler transformations for representing and transforming sparse matrix computations and their data representations. In cooperation with run-time inspection, our compiler derives transformed matrix representations and associated transformed code to implement a variety of representations targeting different architecture platforms. This systematic approach to combining code and data transformations on sparse computations, which extends a polyhedral transformation and code generation framework, permits the compiler to compose these transformations with other transformations to generate code that is on average within 5% and often exceeds manually-tuned, high-performance sparse matrix libraries CUSP and OSKI. Additionally, the compiler-generated inspector codes are on average 1.5 faster than OSKI and perform comparably to CUSP, respectively.

[1]  Padma Raghavan,et al.  Exploiting dense substructures for fast sparse matrix vector multiplication , 2011, Int. J. High Perform. Comput. Appl..

[2]  Albert Cohen,et al.  Polyhedral Code Generation in the Real World , 2006, CC.

[3]  Aart J. C. Bik,et al.  Compiler support for sparse matrix computations , 1996 .

[4]  Aart J. C. Bik,et al.  Automatic Data Structure Selection and Transformation for Sparse Matrix Computations , 1996, IEEE Trans. Parallel Distributed Syst..

[5]  Ken Kennedy,et al.  Improving Memory Hierarchy Performance for Irregular Applications Using Data and Computation Reorderings , 2001, International Journal of Parallel Programming.

[6]  Paul Feautrier,et al.  Automatic Parallelization in the Polytope Model , 1996, The Data Parallel Programming Model.

[7]  Joel H. Saltz,et al.  Programming Irregular Applications: Runtime Support, Compilation and Tools , 1997, Adv. Comput..

[8]  Richard Vuduc,et al.  Automatic performance tuning of sparse matrix kernels , 2003 .

[9]  Chun Chen,et al.  Loop Transformation Recipes for Code Generation and Auto-Tuning , 2009, LCPC.

[10]  Chau-Wen Tseng,et al.  Exploiting locality for irregular scientific codes , 2006, IEEE Transactions on Parallel and Distributed Systems.

[11]  William Pugh,et al.  Optimization within a unified transformation framework , 1996 .

[12]  Hyun Jin Moon,et al.  Fast Sparse Matrix-Vector Multiplication by Exploiting Variable Block Structure , 2005, HPCC.

[13]  P. Sadayappan,et al.  Stencil-Aware GPU Optimization of Iterative Solvers , 2013, SIAM J. Sci. Comput..

[14]  Keshav Pingali,et al.  Next-generation generic programming and its application to sparse matrix computations , 2000, ICS '00.

[15]  Chun Chen,et al.  Polyhedra scanning revisited , 2012, PLDI.

[16]  Mary W. Hall,et al.  Non-affine Extensions to Polyhedral Code Generation , 2014, CGO '14.

[17]  Lawrence Rauchwerger,et al.  A Hybrid Approach to Proving Memory Reference Monotonicity , 2011, LCPC.

[18]  Samuel Williams,et al.  Optimization of sparse matrix-vector multiplication on emerging multicore platforms , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[19]  Bo Wu,et al.  Complexity analysis and algorithm design for reorganizing data to minimize non-coalesced memory accesses on GPU , 2013, PPoPP '13.

[20]  Michael Garland,et al.  Implementing sparse matrix-vector multiplication on throughput-oriented processors , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[21]  Samuel Williams,et al.  Optimizing Sparse Matrix-Multiple Vectors Multiplication for Nuclear Configuration Interaction Calculations , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[22]  Lawrence Rauchwerger,et al.  The LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization , 1995, PLDI '95.

[23]  Sanjay V. Rajopadhye,et al.  Generation of Efficient Nested Loops from Polyhedra , 2000, International Journal of Parallel Programming.

[24]  Larry Carter,et al.  Localizing non-affine array references , 1999, 1999 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00425).

[25]  Xing Liu,et al.  Efficient sparse matrix-vector multiplication on x86-based many-core processors , 2013, ICS '13.

[26]  Ken Kennedy,et al.  Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .

[27]  Larry Carter,et al.  An approach for code generation in the Sparse Polyhedral Framework , 2016, Parallel Comput..

[28]  Joel H. Saltz,et al.  Principles of runtime support for parallel processors , 1988, ICS '88.

[29]  Ken Kennedy,et al.  Improving cache performance in dynamic applications through data and computation reorganization at run time , 1999, PLDI '99.

[30]  Carlos Guestrin,et al.  Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .

[31]  Harry A. G. Wijshoff,et al.  Sublimation: Expanding Data Structures to Enable Data Instance Specific Optimizations , 2010, LCPC.

[32]  Keshav Pingali,et al.  Sparse code generation for imperfectly nested loops with dependences , 1997, ICS '97.

[33]  John R. Gilbert,et al.  The Combinatorial BLAS: design, implementation, and applications , 2011, Int. J. High Perform. Comput. Appl..

[34]  Chun Chen,et al.  A Programming Language Interface to Describe Transformations and Code Generation , 2010, LCPC.

[35]  William Pugh,et al.  SIPR: A New Framework for Generating Efficient Code for Sparse Matrix Computations , 1998, LCPC.

[36]  J. Ramanujam,et al.  Code generation for parallel execution of a class of irregular loops on distributed memory systems , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[37]  Rudolf Eigenmann,et al.  Optimizing irregular shared-memory applications for distributed-memory systems , 2006, PPoPP '06.

[38]  John R. Gilbert,et al.  Highly Parallel Sparse Matrix-Matrix Multiplication , 2010, ArXiv.

[39]  Timothy A. Davis,et al.  The university of Florida sparse matrix collection , 2011, TOMS.

[40]  Calvin J. Ribbens,et al.  Pattern-based sparse matrix representation for memory-efficient SMVM kernels , 2009, ICS.

[41]  Corinne Ancourt,et al.  Scanning polyhedra with DO loops , 1991, PPOPP '91.

[42]  David A. Padua,et al.  Compiler analysis of irregular memory accesses , 2000, PLDI '00.

[43]  A. J. C. Bik,et al.  Advanced compiler optimizations for sparse computations , 1993, Supercomputing '93.

[44]  Katherine Yelick,et al.  OSKI: A library of automatically tuned sparse matrix kernels , 2005 .

[45]  John M. Mellor-Crummey,et al.  Optimizing Sparse Matrix–Vector Product Computations Using Unroll and Jam , 2004, Int. J. High Perform. Comput. Appl..

[46]  Johannes Hölzl,et al.  Specifying and verifying sparse matrix codes , 2010, ICFP '10.

[47]  Aart J. C. Bik,et al.  On Automatic Data Structure Selection and Code Generation for Sparse Computations , 1993, LCPC.

[48]  Keshav Pingali,et al.  A Relational Approach to the Compilation of Sparse Matrix Programs , 1997, Euro-Par.