Deriving Correct High-Performance Algorithms

Dijkstra observed that verifying correctness of a program is difficult and conjectured that derivation of a program hand-in-hand with its proof of correctness was the answer. We illustrate this goal-oriented approach by applying it to the domain of dense linear algebra libraries for distributed memory parallel computers. We show that algorithms that underlie the implementation of most functionality for this domain can be systematically derived to be correct. The benefit is that an entire family of algorithms for an operation is discovered so that the best algorithm for a given architecture can be chosen. This approach is very practical: Ideas inspired by it have been used to rewrite the dense linear algebra software stack starting below the Basic Linear Algebra Subprograms (BLAS) and reaching up through the Elemental distributed memory library, and every level in between. The paper demonstrates how formal methods and rigorous mathematical techniques for correctness impact HPC.

[1]  Robert A. van de Geijn,et al.  A flexible class of parallel matrix multiplication algorithms , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.

[2]  Paolo Bientinesi,et al.  Knowledge-Based Automatic Generation of Partitioned Matrix Expressions , 2011, CASC.

[3]  Robert A. van de Geijn,et al.  The science of deriving dense linear algebra algorithms , 2005, TOMS.

[4]  Robert A. van de Geijn,et al.  BLIS: A Framework for Rapidly Instantiating BLAS Functionality , 2015, ACM Trans. Math. Softw..

[5]  Robert A. van de Geijn,et al.  Using PLAPACK - parallel linear algebra package , 1997 .

[6]  Robert A. van de Geijn,et al.  Code Generation and Optimization of Distributed-Memory Dense Linear Algebra Kernels , 2013, ICCS.

[7]  Tze Meng Low,et al.  A Family of Provably Correct Algorithms for Exact Triangle Counting , 2017, CORRECTNESS@SC.

[8]  Bryan Marker Design by transformation : from domain knowledge to optimized program generation , 2014 .

[9]  Robert A. van de Geijn,et al.  Families of algorithms related to the inversion of a Symmetric Positive Definite matrix , 2008, TOMS.

[10]  Robert A. van de Geijn,et al.  Mechanical derivation and systematic analysis of correct linear algebra algorithms , 2006 .

[11]  Paolo Bientinesi,et al.  Automatic Generation of Loop-Invariants for Matrix Operations , 2011, 2011 International Conference on Computational Science and Its Applications.

[12]  Armando Solar-Lezama,et al.  Report of the HPC Correctness Summit, Jan 25-26, 2017, Washington, DC , 2017, ArXiv.

[13]  Robert A. van de Geijn,et al.  THE SCIENCE OF DERIVING STABILITY ANALYSES , 2008 .

[14]  Jack J. Dongarra,et al.  A set of level 3 basic linear algebra subprograms , 1990, TOMS.

[15]  Robert A. van de Geijn,et al.  SUMMA: scalable universal matrix multiplication algorithm , 1995, Concurr. Pract. Exp..

[16]  Paolo Bientinesi,et al.  Knowledge-based automatic generation of linear algebra algorithms and code , 2014 .

[17]  Robert A. van de Geijn,et al.  A case study in mechanically deriving dense linear algebra code , 2013, Int. J. High Perform. Comput. Appl..

[18]  Tze Meng Low,et al.  The BLIS Framework , 2016 .

[19]  Robert A. van de Geijn,et al.  The libflame Library for Dense Matrix Computations , 2009, Computing in Science & Engineering.

[20]  Robert A. van de Geijn,et al.  Anatomy of high-performance matrix multiplication , 2008, TOMS.

[21]  Robert A. van de Geijn,et al.  Goal-Oriented and Modular Stability Analysis , 2011, SIAM J. Matrix Anal. Appl..

[22]  Edsger W. Dijkstra,et al.  The humble programmer , 1972, CACM.

[23]  Robert A. van de Geijn,et al.  FLAME: Formal Linear Algebra Methods Environment , 2001, TOMS.

[24]  Robert A. van de Geijn,et al.  Parallel implementation of BLAS: general techniques for Level 3 BLAS , 1995, Concurr. Pract. Exp..

[25]  Robert A. van de Geijn,et al.  Deriving dense linear algebra libraries , 2013, Formal Aspects of Computing.

[26]  Greg Henry,et al.  Application of a High Performance Parallel Eigensolver to Electronic Structure Calculations , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[27]  Robert A. van de Geijn,et al.  High-performance implementation of the level-3 BLAS , 2008, TOMS.

[28]  Victor Eijkhout,et al.  Towards mechanical derivation of Krylov solver libraries , 2010, ICCS.

[29]  Tze Meng Low A calculus of loop invariants for dense linear algebra optimization , 2013 .

[30]  Robert A. van de Geijn,et al.  Understanding performance stairs: elucidating heuristics , 2014, ASE.