EGGS: Sparsity‐Specific Code Generation

Sparse matrix computations are among the most important computational patterns, commonly used in geometry processing, physical simulation, graph algorithms, and other situations where sparse data arises. In many cases, the structure of a sparse matrix is known a priori, but the values may change or depend on inputs to the algorithm. We propose a new methodology for compile‐time specialization of algorithms relying on mixing sparse and dense linear algebra operations, using an extension to the widely‐used open source Eigen package. In contrast to library approaches optimizing individual building blocks of a computation (such as sparse matrix product), we generate reusable sparsity‐specific implementations for a given algorithm, utilizing vector intrinsics and reducing unnecessary scanning through matrix structures. We demonstrate the effectiveness of our technique on a benchmark of artificial expressions to quantitatively evaluate the benefit of our approach over the state‐of‐the‐art library Intel MKL. To further demonstrate the practical applicability of our technique we show that our technique can improve performance, with minimal code changes, for mesh smoothing, mesh parametrization, volumetric deformation, optical flow, and computation of the Laplace operator.

[1]  Elizabeth R. Jessup,et al.  Reliable Generation of High-Performance Matrix Algebra , 2012, ACM Trans. Math. Softw..

[2]  Jack J. Dongarra,et al.  Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[3]  Mark Meyer,et al.  Implicit fairing of irregular meshes using diffusion and curvature flow , 1999, SIGGRAPH.

[4]  Shoaib Kamil,et al.  OpenTuner: An extensible framework for program autotuning , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).

[5]  Albert Cohen,et al.  Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions , 2018, ArXiv.

[6]  Qian Wang,et al.  AUGEM: Automatically generate high performance Dense Linear Algebra kernels on x86 CPUs , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[7]  Gaël Varoquaux,et al.  The NumPy Array: A Structure for Efficient Numerical Computation , 2011, Computing in Science & Engineering.

[8]  Elizabeth R. Jessup,et al.  Automating the generation of composed linear algebra kernels , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[9]  Aart J. C. Bik,et al.  Compilation techniques for sparse matrix computations , 1993, ICS '93.

[10]  Olaf Schenk,et al.  Toward the Next Generation of Multiperiod Optimal Power Flow Solvers , 2018, IEEE Transactions on Power Systems.

[11]  Peter Ahrens,et al.  Tensor Algebra Compilation with Workspaces , 2019, 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[12]  Olaf Schenk,et al.  Enhancing the scalability of selected inversion factorization algorithms in genomic prediction , 2017, J. Comput. Sci..

[13]  Jan Fostier,et al.  Needles: Toward Large-Scale Genomic Prediction with Marker-by-Environment Interaction , 2016, Genetics.

[14]  Shoaib Kamil,et al.  ParSy: Inspection and Transformation of Sparse Matrix Computations for Parallelism , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.

[15]  Tomofumi Yuki,et al.  Sparse computation data dependence simplification for efficient compiler-generated inspectors , 2019, PLDI.

[16]  John Michael McNamee Algorithm 408: a sparse matrix package (part I) [F4] , 1971, CACM.

[17]  Wojciech Matusik,et al.  Simit , 2016, ACM Trans. Graph..

[18]  Paul Feautrier,et al.  Dataflow analysis of array and scalar references , 1991, International Journal of Parallel Programming.

[19]  Benoît Meister,et al.  Polyhedral Optimization of TensorFlow Computation Graphs , 2017, ESPT/VPA@SC.

[20]  J. W. Walker,et al.  Direct solutions of sparse network equations by optimally ordered triangular factorization , 1967 .

[21]  Hans-Peter Seidel,et al.  Interactive multi-resolution modeling on arbitrary meshes , 1998, SIGGRAPH.

[22]  Mary W. Hall,et al.  Loop and data transformations for sparse matrix code , 2015, PLDI.

[23]  Matthias Nießner,et al.  Opt , 2016, ACM Trans. Graph..

[24]  Paul Feautrier,et al.  Array expansion , 1988, ICS '88.

[25]  William Gropp,et al.  Efficient Management of Parallelism in Object-Oriented Numerical Software Libraries , 1997, SciTools.

[26]  Victor Alessandrini Intel Threading Building Blocks , 2016 .

[27]  Michael Wolfe,et al.  Optimizing supercompilers for supercomputers , 1989, ICS.

[28]  Katherine Yelick,et al.  Autotuning Sparse Matrix-Vector Multiplication for Multicore , 2012 .

[29]  François Irigoin,et al.  Supernode partitioning , 1988, POPL '88.

[30]  Chau-Wen Tseng,et al.  Improving data locality with loop transformations , 1996, TOPL.

[31]  Conrad Sanderson,et al.  Armadillo: An Open Source C++ Linear Algebra Library for Fast Prototyping and Computationally Intensive Experiments , 2010 .

[32]  Olga Sorkine-Hornung,et al.  Scalable locally injective mappings , 2017, TOGS.

[33]  William Pugh,et al.  SIPR: A New Framework for Generating Efficient Code for Sparse Matrix Computations , 1998, LCPC.

[34]  Shoaib Kamil,et al.  Sympiler: Transforming Sparse Matrix Codes by Decoupling Symbolic Analysis , 2017, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.

[35]  Katherine Yelick,et al.  OSKI: A library of automatically tuned sparse matrix kernels , 2005 .

[36]  Berthold K. P. Horn,et al.  Determining Optical Flow , 1981, Other Conferences.

[37]  Aart J. C. Bik,et al.  On Automatic Data Structure Selection and Code Generation for Sparse Computations , 1993, LCPC.

[38]  Philip Levis,et al.  Ebb: A DSL for Physical Simluation on CPUs and GPUs , 2015, ACM Trans. Graph..

[39]  Mary W. Hall,et al.  The Sparse Polyhedral Framework: Composing Compiler-Generated Inspector-Executor Code , 2018, Proceedings of the IEEE.

[40]  James Demmel,et al.  Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology , 1997, ICS '97.

[41]  Iain S. Duff,et al.  An overview of the sparse basic linear algebra subprograms: The new standard from the BLAS technical forum , 2002, TOMS.

[42]  Saman P. Amarasinghe,et al.  Format abstraction for sparse tensor algebra compilers , 2018, Proc. ACM Program. Lang..

[43]  Keshav Pingali,et al.  A Relational Approach to the Compilation of Sparse Matrix Programs , 1997, Euro-Par.

[44]  Gabriel Rodríguez,et al.  Generating piecewise-regular code from irregular structures , 2019, PLDI.

[45]  E WolfMichael,et al.  A data locality optimizing algorithm , 1991 .

[46]  Shoaib Kamil,et al.  The tensor algebra compiler , 2017, Proc. ACM Program. Lang..