Compiler Support for Sparse Tensor Computations in MLIR

Sparse tensors arise in problems in science, engineering, machine learning, and data analytics. Programs that operate on such tensors can exploit sparsity to reduce storage requirements and computational time. Developing and maintaining sparse software by hand, however, is a complex and error-prone task. Therefore, we propose treating sparsity as a property of tensors, not a tedious implementation task, and letting a sparse compiler generate sparse code automatically from a sparsity-agnostic definition of the computation. This article discusses integrating this idea into MLIR.

[1]  Gokcen Kestor,et al.  A High Performance Sparse Tensor Algebra Compiler in MLIR , 2021, 2021 IEEE/ACM 7th Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC).

[2]  Kunle Olukotun,et al.  Compilation of sparse array programming models , 2021, Proc. ACM Program. Lang..

[3]  Uday Bondhugula,et al.  MLIR: Scaling Compiler Infrastructure for Domain Specific Computation , 2021, 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[4]  Fabio Checconi,et al.  ALTO: adaptive linearized storage of sparse tensors , 2021, ICS.

[5]  Gokcen Kestor,et al.  COMET: A Domain-Specific Compilation of High-Performance Computational Chemistry , 2021, LCPC.

[6]  Shoaib Kamil,et al.  A sparse iteration space transformation framework for sparse tensor algebra , 2020, Proc. ACM Program. Lang..

[7]  Erich Elsen,et al.  Sparse GPU Kernels for Deep Learning , 2020, SC20: International Conference for High Performance Computing, Networking, Storage and Analysis.

[8]  Michael Carbin,et al.  TIRAMISU: A Polyhedral Compiler for Dense and Sparse Deep Learning , 2020, ArXiv.

[9]  K. D. Ikramov Sparse matrices , 2020, Krylov Subspace Methods with Application in Incompressible Fluid Flow Solvers.

[10]  Uday Bondhugula,et al.  MLIR: A Compiler Infrastructure for the End of Moore's Law , 2020, ArXiv.

[11]  Saman Amarasinghe,et al.  Automatic generation of efficient sparse tensor format conversion routines , 2020, PLDI.

[12]  TIMOTHY A. DAVIS,et al.  Algorithm 1000 , 2019, ACM Transactions on Mathematical Software.

[13]  Frédo Durand,et al.  Taichi , 2019, ACM Trans. Graph..

[14]  Zhen Xie,et al.  IA-SpGEMM: an input-aware auto-tuning framework for parallel sparse matrix-matrix multiplication , 2019, ICS.

[15]  Jimeng Sun,et al.  Efficient and effective sparse tensor reordering , 2019, ICS.

[16]  Juan C. Pichel,et al.  Sparse Matrix Classification on Imbalanced Datasets Using Convolutional Neural Networks , 2019, IEEE Access.

[17]  John D. Owens,et al.  GraphBLAST: A High-Performance Linear Algebra-based Graph Framework on the GPU , 2019, ACM Trans. Math. Softw..

[18]  Peter Ahrens,et al.  Tensor Algebra Compilation with Workspaces , 2019, 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[19]  P. Sadayappan,et al.  Sampled Dense Matrix Multiplication for High-Performance Machine Learning , 2018, 2018 IEEE 25th International Conference on High Performance Computing (HiPC).

[20]  Jimeng Sun,et al.  HiCOO: Hierarchical Storage of Sparse Tensors , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.

[21]  Mary W. Hall,et al.  The Sparse Polyhedral Framework: Composing Compiler-Generated Inspector-Executor Code , 2018, Proceedings of the IEEE.

[22]  Saman P. Amarasinghe,et al.  Format abstraction for sparse tensor algebra compilers , 2018, Proc. ACM Program. Lang..

[23]  Albert Cohen,et al.  Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions , 2018, ArXiv.

[24]  Shoaib Kamil,et al.  The tensor algebra compiler , 2017, Proc. ACM Program. Lang..

[25]  Yue Zhao,et al.  Bridging the gap between deep learning and sparse matrix format selection , 2018, PPoPP.

[26]  Shoaib Kamil,et al.  Sympiler: Transforming Sparse Matrix Codes by Decoupling Symbolic Analysis , 2017, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.

[27]  Alexander Heinecke,et al.  LIBXSMM: Accelerating Small Matrix Multiplications by Runtime Code Generation , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.

[28]  Christos Faloutsos,et al.  Mining billion-scale tensors: algorithms and discoveries , 2016, The VLDB Journal.

[29]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[30]  Pradeep Dubey,et al.  GraphPad: Optimized Graph Primitives for Parallel and Distributed Platforms , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[31]  Torsten Hoefler,et al.  Sparse Tensor Algebra as a Parallel Programming Model , 2015, ArXiv.

[32]  George Karypis,et al.  Tensor-matrix products with a compressed sparse tensor , 2015, IA3@SC.

[33]  Mary W. Hall,et al.  Loop and data transformations for sparse matrix code , 2015, PLDI.

[34]  David A. Bader,et al.  Graphs, Matrices, and the GraphBLAS: Seven Good Reasons , 2015, ICCS.

[35]  James Demmel,et al.  Cyclops Tensor Framework: Reducing Communication and Eliminating Load Imbalance in Massively Parallel Contractions , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[36]  Daniel Kats,et al.  Sparse tensor framework for implementation of general local correlation methods. , 2013, The Journal of chemical physics.

[37]  Chun Chen,et al.  Improving High-Performance Sparse Libraries Using Compiler-Assisted Specialization: A PETSc Case Study , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[38]  Timothy A. Davis,et al.  The university of Florida sparse matrix collection , 2011, TOMS.

[39]  John R. Gilbert,et al.  The Combinatorial BLAS: design, implementation, and applications , 2011, Int. J. High Perform. Comput. Appl..

[40]  Martin Head-Gordon,et al.  A sparse framework for the derivation and implementation of fermion algebra , 2010 .

[41]  John R. Gilbert,et al.  Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks , 2009, SPAA '09.

[42]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[43]  John R. Gilbert,et al.  On the representation and multiplication of hypersparse matrices , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[44]  Katherine Yelick,et al.  OSKI: A library of automatically tuned sparse matrix kernels , 2005 .

[45]  Richard W. Vuduc,et al.  Sparsity: Optimization Framework for Sparse Matrix Kernels , 2004, Int. J. High Perform. Comput. Appl..

[46]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[47]  Keshav Pingali,et al.  Next-generation generic programming and its application to sparse matrix computations , 2000, ICS '00.

[48]  Aart J. C. Bik,et al.  Automatic Nonzero Structure Analysis , 1999, SIAM J. Comput..

[49]  William Pugh,et al.  SIPR: A New Framework for Generating Efficient Code for Sparse Matrix Computations , 1998, LCPC.

[50]  Aart J. C. Bik,et al.  The automatic generation of sparse primitives , 1998, TOMS.

[51]  Barbara M. Chapman,et al.  Vienna-Fortran/HPF Extensions for Sparse and Irregular Problems and Their Compilation , 1997, IEEE Trans. Parallel Distributed Syst..

[52]  Keshav Pingali,et al.  A Relational Approach to the Compilation of Sparse Matrix Programs , 1997, Euro-Par.

[53]  William Gropp,et al.  Efficient Management of Parallelism in Object-Oriented Numerical Software Libraries , 1997, SciTools.

[54]  R. F. Boisvert,et al.  The Matrix Market Exchange Formats: Initial Design | NIST , 1996 .

[55]  Aart J. C. Bik,et al.  Automatic Data Structure Selection and Transformation for Sparse Matrix Computations , 1996, IEEE Trans. Parallel Distributed Syst..

[56]  Aart J. C. Bik,et al.  Advanced compiler optimizations for sparse computations , 1993, Supercomputing '93. Proceedings.

[57]  Z. Zlatev Computational Methods for General Sparse Matrices , 1991 .

[58]  Youcef Saad,et al.  A Basic Tool Kit for Sparse Matrix Computations , 1990 .

[59]  Yousef Saad,et al.  Solving Sparse Triangular Linear Systems on Parallel Computers , 1989, Int. J. High Speed Comput..

[60]  I. Duff,et al.  Direct Methods for Sparse Matrices , 1987 .

[61]  Joseph W. Liu,et al.  A compact row storage scheme for Cholesky factors using elimination trees , 1986, TOMS.

[62]  Thomas F. Coleman,et al.  Large Sparse Numerical Optimization , 1984, Lecture Notes in Computer Science.

[63]  David R. Kincaid,et al.  Algorithm 586: ITPACK 2C: A FORTRAN Package for Solving Large Sparse Linear Systems by Adaptive Accelerated Iterative Methods , 1982, TOMS.

[64]  J. Pasciak,et al.  Computer solution of large sparse positive definite systems , 1982 .

[65]  I. Duff A survey of sparse matrix research , 1977, Proceedings of the IEEE.

[66]  C. G. Broyden Large Sparse Sets of Linear Equations , 1972 .

[67]  John K. Reid,et al.  The Solution of Large Sparse Unsymmetric Systems of Linear Equations , 1971, IFIP Congress.

[68]  J. W. Walker,et al.  Direct solutions of sparse network equations by optimally ordered triangular factorization , 1967 .

[69]  William F. Tinney,et al.  Techniques for Exploiting the Sparsity or the Network Admittance Matrix , 1963 .

[70]  T. Davis Algorithm 1000: SuiteSparse: GraphBLAS: Graph Algorithms in the Language of Sparse Linear Algebra , 2019, ACM Trans. Math. Softw..

[71]  Parker Allen Tew,et al.  An investigation of sparse tensor formats for tensor libraries , 2016 .

[72]  Endong Wang,et al.  Intel Math Kernel Library , 2014 .

[73]  Pramod Kumbhar,et al.  Performance of PETSc GPU Implementation with Sparse Matrix Storage Schemes , 2011 .

[74]  Gilad Arnold,et al.  Data-Parallel Language for Correct and Efficient Sparse Matrix Codes , 2011 .

[75]  Eun-Jin Im,et al.  Model-Based Memory Hierarchy Optimizations for Sparse Matrices , 2007 .

[76]  Aart J. C. Bik,et al.  Compiler support for sparse matrix computations , 1996 .

[77]  John R. Gilbert,et al.  Sparse Matrices in MATLAB: Design and Implementation , 1992, SIAM J. Matrix Anal. Appl..

[78]  Thomas C. Oppe,et al.  The performance of ITPACK on vector computers for solving large sparse linear systems arising in sample oil reseervoir simulation problems , 1987 .

[79]  John R. Rice,et al.  Solving elliptic problems using ELLPACK , 1985, Springer series in computational mathematics.

[80]  Sergio Pissanetzky,et al.  Sparse Matrix Technology , 1984 .

[81]  Marinus Veldhorst,et al.  An analysis of sparse matrix storage schemes , 1982 .

[82]  Fred G. Gustavson,et al.  Some Basic Techniques for Solving Sparse Systems of Linear Equations , 1972 .