Automatic Generation of Efficient Linear Algebra Programs

The level of abstraction at which application experts reason about linear algebra computations and the level of abstraction used by developers of high-performance numerical linear algebra libraries do not match. The former is conveniently captured by high-level languages and libraries such as Matlab and Eigen, while the latter expresses the kernels included in the BLAS and LAPACK libraries. Unfortunately, the translation from a high-level computation to an efficient sequence of kernels is a task, far from trivial, that requires extensive knowledge of both linear algebra and high-performance computing. Internally, almost all high-level languages and libraries use efficient kernels; however, the translation algorithms are too simplistic and thus lead to a suboptimal use of said kernels, with significant performance losses. In order to both achieve the productivity that comes with high-level languages, and make use of the efficiency of low level kernels, we are developing Linnea, a code generator for linear algebra problems. As input, Linnea takes a high-level description of a linear algebra problem and produces as output an efficient sequence of calls to high-performance kernels. In 25 application problems, the code generated by Linnea always outperforms Matlab, Julia, Eigen and Armadillo, with speedups up to and exceeding 10×.

[1]  Paolo Bientinesi,et al.  Knowledge-Based Automatic Generation of Partitioned Matrix Expressions , 2011, CASC.

[2]  David E. Bernholdt,et al.  Synthesis of High-Performance Parallel Programs for a Class of ab Initio Quantum Chemistry Models , 2005, Proceedings of the IEEE.

[3]  Andrés Marzal,et al.  Computing the K Shortest Paths: A New Algorithm and an Experimental Comparison , 1999, WAE.

[4]  Manuel Krebber,et al.  Non-linear Associative-Commutative Many-to-One Pattern Matching with Sequence Variables , 2017, ArXiv.

[5]  Martin Odersky,et al.  Lightweight modular staging: a pragmatic approach to runtime code generation and compiled DSLs , 2010, GPCE '10.

[6]  OderskyMartin,et al.  Lightweight modular staging , 2010 .

[7]  Paolo Bientinesi,et al.  The generalized matrix chain algorithm , 2018, CGO.

[8]  Robert A. van de Geijn,et al.  Families of algorithms related to the inversion of a Symmetric Positive Definite matrix , 2008, TOMS.

[9]  Paolo Bientinesi,et al.  A Domain-Specific Compiler for Linear Algebra Operations , 2012, VECPAR.

[10]  Alfred V. Aho,et al.  Optimal code generation for expression trees , 1975, STOC.

[11]  Nicholas J. Higham,et al.  INVERSE PROBLEMS NEWSLETTER , 1991 .

[12]  Henrik Barthels,et al.  MatchPy: Pattern Matching in Python , 2018, J. Open Source Softw..

[13]  Ulrich Rüde,et al.  Expression Templates Revisited: A Performance Analysis of Current Methodologies , 2011, SIAM J. Sci. Comput..

[14]  A. J. J. Dick An Introduction to Knuth-Bendix Completion , 1991, Comput. J..

[15]  Robert M. Gower,et al.  Randomized Quasi-Newton Updates Are Linearly Convergent Matrix Inversion Algorithms , 2016, SIAM J. Matrix Anal. Appl..

[16]  Robert A. van de Geijn,et al.  Representing linear algebra algorithms in code: the FLAME application program interfaces , 2005, TOMS.

[17]  Tze Meng Low,et al.  SPIRAL: Extreme Performance Portability , 2018, Proceedings of the IEEE.

[18]  Paolo Bientinesi,et al.  Efficient Pattern Matching in Python , 2017, PyHPC@SC.

[19]  Alfred V. Aho,et al.  Code generation using tree matching and dynamic programming , 1989, ACM Trans. Program. Lang. Syst..

[20]  Steven G. Johnson,et al.  The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.

[21]  Steven S. Muchnick,et al.  Advanced Compiler Design and Implementation , 1997 .

[22]  Nicholas J. Higham,et al.  Functions of matrices - theory and computation , 2008 .

[23]  Jack Dongarra,et al.  LAPACK Users' Guide, 3rd ed. , 1999 .

[24]  Robert A. van de Geijn,et al.  Goal-Oriented and Modular Stability Analysis , 2011, SIAM J. Matrix Anal. Appl..

[25]  Robert A. van de Geijn,et al.  Designing Linear Algebra Algorithms by Transformation: Mechanizing the Expert Developer , 2012, VECPAR.

[26]  Claude Gomez,et al.  Maple programs for generating efficient FORTRAN code for serial and vectorised machines , 1998 .

[27]  Tobias Nipkow,et al.  Term rewriting and all that , 1998 .

[28]  Paolo Bientinesi,et al.  A Study on the Influence of Caching: Sequences of Dense Linear Algebra Kernels , 2014, VECPAR.

[29]  Daniele G. Spampinato,et al.  A basic linear algebra compiler for structured matrices , 2016, 2016 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[30]  Robert A. van de Geijn,et al.  FLAME: Formal Linear Algebra Methods Environment , 2001, TOMS.

[31]  Michael Stepp,et al.  Equality saturation: a new approach to optimization , 2009, POPL '09.

[32]  Jeffrey D. Ullman,et al.  The Generation of Optimal Code for Arithmetic Expressions , 1970, JACM.

[33]  Nisheeth K. Vishnoi,et al.  On a Natural Dynamics for Linear Programming , 2015, ITCS.

[34]  Nils J. Nilsson,et al.  Principles of Artificial Intelligence , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Robert A. van de Geijn,et al.  The science of deriving dense linear algebra algorithms , 2005, TOMS.

[36]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[37]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[38]  Elizabeth R. Jessup,et al.  Build to order linear algebra kernels , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[39]  Luis Tenorio,et al.  Stochastic Newton and Quasi-Newton Methods for Large Linear Least-squares Problems , 2017, ArXiv.

[40]  Jan Vitek,et al.  Julia: dynamism and performance reconciled by design , 2018, Proc. ACM Program. Lang..

[41]  Raja Giryes,et al.  Image Restoration by Iterative Denoising and Backward Projections , 2017, IEEE Transactions on Image Processing.

[42]  Paolo Bientinesi,et al.  Automatic Generation of Loop-Invariants for Matrix Operations , 2011, 2011 International Conference on Computational Science and Its Applications.

[43]  BientinesiPaolo,et al.  Representing linear algebra algorithms in code: the FLAME application program interfaces , 2005 .

[44]  Conrad Sanderson,et al.  Armadillo: An Open Source C++ Linear Algebra Library for Fast Prototyping and Computationally Intensive Experiments , 2010 .

[45]  Paolo Bientinesi,et al.  Program generation for small-scale linear algebra applications , 2018, CGO.

[46]  Gene H. Golub,et al.  Tikhonov Regularization and Total Least Squares , 1999, SIAM J. Matrix Anal. Appl..

[47]  Ivan W. Selesnick,et al.  Sparsity-based correction of exponential artifacts , 2015, Signal Process..

[48]  Paolo Bientinesi,et al.  MatchPy: A Pattern Matching Library , 2017, SciPy.

[49]  Eduardo Pelegrí-Llopart,et al.  Optimal code generation for expression trees: an application BURS theory , 1988, POPL '88.

[50]  Jack J. Dongarra,et al.  A set of level 3 basic linear algebra subprograms , 1990, TOMS.

[51]  Martin D. Schatz,et al.  DxTer: An Extensible Tool for Optimal Dataflow Program Generation , 2015 .

[52]  Xinwei Deng,et al.  A Parallel Implementation of the Ensemble Kalman Filter Based on Modified Cholesky Decomposition , 2019, J. Comput. Sci..

[53]  Rong-Qing Jia,et al.  Smoothness of Multiple Refinable Functions and Multiple Wavelets , 1999, SIAM J. Matrix Anal. Appl..