A Modern Framework for Portable High-Performance Numerical Linear Algebra

In this chapter, we present a generic programming methodology for expressing data structures and algorithms for numerical linear algebra. We also present a high-performance implementation of this approach, the Matrix Template Library (MTL). As with the Standard Template Library, our approach is five-fold, consisting of generic functions, containers, iterators, adapters, and function objects, all developed specifically for high-performance numerical linear algebra. Within our framework, we provide generic functions corresponding to the mathematical operations that define linear algebra. Similarly, the containers, adapters, and iterators are used to represent and to manipulate concrete linear algebra objects such as matrices and vectors. To many scientific computing users, however, the advantages of an elegant programming interface are secondary to issues of performance. There are two aspects to how we achieved high performance in the MTL. The first is the use of static polymorphism (template functions) with modern optimizing compilers, enabling extreme flexibility with no loss in performance. The second is the application of abstraction to the optimization process itself. The Basic Linear Algebra Instruction Set (BLAIS) is presented as an abstract interface to several important performance optimizations. Our experimental results show that MTL with the BLAIS achieves performance that is as good as, or better than, vendor-tuned libraries, even though MTL and the BLAIS are written completely in C++. We therefore conclude that the use of abstraction is not a barrier to performance, contrary to conventional wisdom, and that certain abstractions can in fact facilitate optimization. In addition, MTL requires orders of magnitude fewer lines of code for its implementation, with the concomitant savings in development and maintenance effort.

[1]  Brian T. Smith,et al.  Matrix Eigensystem Routines — EISPACK Guide , 1974, Lecture Notes in Computer Science.

[2]  Charles L. Lawson,et al.  Basic Linear Algebra Subprograms for Fortran Usage , 1979, TOMS.

[3]  Gene H. Golub,et al.  Matrix computations , 1983 .

[4]  P. Brown,et al.  Matrix-free methods for stiff systems of ODE's , 1986 .

[5]  Y. Saad,et al.  GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems , 1986 .

[6]  Bjarne Stroustrup,et al.  C++ Programming Language , 1986, IEEE Softw..

[7]  Jack Dongarra,et al.  LINPACK Users' Guide , 1987 .

[8]  Jack J. Dongarra,et al.  Algorithm 656: an extended set of basic linear algebra subprograms: model implementation and test programs , 1988, TOMS.

[9]  Alexander A. Stepanov,et al.  Generic Programming , 1988, ISSAC.

[10]  Ken Kennedy,et al.  Blocking Linear Algebra Codes for Memory Hierarchies , 1989, PPSC.

[11]  Jack J. Dongarra,et al.  A set of level 3 basic linear algebra subprograms , 1990, TOMS.

[12]  Jack Dongarra,et al.  LAPACK: a portable linear algebra library for high-performance computers , 1990, SC.

[13]  Youcef Saad,et al.  A Basic Tool Kit for Sparse Matrix Computations , 1990 .

[14]  Monica S. Lam,et al.  The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.

[15]  Monica S. Lam,et al.  Automatic Blocking by a Compiler , 1991, PPSC.

[16]  Monica S. Lam,et al.  A data locality optimizing algorithm , 1991, PLDI '91.

[17]  Michael E. Wolf,et al.  Improving locality and parallelism in nested loops , 1992 .

[18]  Don S. Batory,et al.  The design and implementation of hierarchical software systems with reusable components , 1992, TSEM.

[19]  Henk A. van der Vorst,et al.  Bi-CGSTAB: A Fast and Smoothly Converging Variant of Bi-CG for the Solution of Nonsymmetric Linear Systems , 1992, SIAM J. Sci. Comput..

[20]  Ken Kennedy,et al.  Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution , 1993, LCPC.

[21]  Ralph Johnson,et al.  design patterns elements of reusable object oriented software , 2019 .

[22]  Richard Barrett,et al.  Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods , 1994, Other Titles in Applied Mathematics.

[23]  Kathryn S. McKinley,et al.  Tile size selection using cache organization and data layout , 1995, PLDI '95.

[24]  Larry Carter,et al.  Hierarchical tiling for improved superscalar performance , 1995, Proceedings of 9th International Parallel Processing Symposium.

[25]  Olivier Temam,et al.  A quantitative analysis of loop nest locality , 1996, ASPLOS VII.

[26]  Todd L. Veldhuizen,et al.  Expression templates , 1996 .

[27]  Todd L. Veldhuizen,et al.  Using C++ template metaprograms , 1996 .

[28]  J. Demmel,et al.  Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology , 1997 .

[29]  Steve Carr,et al.  Unroll-and-jam using uniformly generated sets , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[30]  Steven S. Muchnick,et al.  Advanced Compiler Design and Implementation , 1997 .

[31]  Todd L. Veldhuizen,et al.  Will C++ Be Faster than Fortran? , 1997, ISCOPE.

[32]  Cristina V. Lopes,et al.  Aspect-Oriented Programming Workshop Report , 1997, ECOOP Workshops.

[33]  Yutaka Ishikawa,et al.  Scientific Computing in Object-Oriented Parallel Environments , 1997, Lecture Notes in Computer Science.

[34]  Andrew Lumsdaine,et al.  The Role of Abstraction in High-Performance Computing , 1997, ISCOPE.

[35]  Todd L. Veldhuizen,et al.  Arrays in Blitz++ , 1998, ISCOPE.

[36]  Robert Glück,et al.  Generative Programming and Active Libraries , 1998, Generic Programming.

[37]  Krzysztof Czarnecki,et al.  Generative programming - principles and techniques of software engineering based on automated configuration and fragment-based component models , 1999 .

[38]  Cristina V. Lopes,et al.  Aspect-oriented programming , 1999, ECOOP Workshops.

[39]  Alexander A. Stepanov,et al.  C++ Standard Template Library , 2000 .

[40]  Roldan Pozo,et al.  NIST sparse BLAS user's guide , 2001 .

[41]  David R. Musser,et al.  STL tutorial and reference guide , 2001 .