Performance Prediction through Time Measurements

In this article we address the problem of predicting performance of linear algebra algorithms for small matrices. This approach is based on reducing the performance prediction to modeling the execution time of algorithms. The execution time of higher level algorithms like the LU factorization is predicted through modeling the computational time of the kernel linear algebra operations such as the BLAS subroutines. As the time measurements conrmed, the execution time of the BLAS subroutines has a piecewise-polynomial behavior. Therefore, the subroutines time is modeled by conducting only few samples and then applying polynomial interpolation. The validation of the approach is established by comparing the predicted execution time of the unblocked LU factorization, which is built on top of two BLAS subroutines, with the separately measured one. The applicability of the approach is illustrated through performance experiments on Intel and AMD processors.

[1]  Robert A. van de Geijn,et al.  Representing linear algebra algorithms in code: the FLAME application program interfaces , 2005, TOMS.

[2]  Robert A. van de Geijn,et al.  FLAME: Formal Linear Algebra Methods Environment , 2001, TOMS.

[3]  Javier Cuenca,et al.  Architecture of an automatically tuned linear algebra library , 2004, Parallel Comput..

[4]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[5]  Chi-Bang Kuan,et al.  Automated Empirical Optimization , 2011, Encyclopedia of Parallel Computing.

[6]  R. Clint Whaley,et al.  Achieving accurate and context‐sensitive timing for code optimization , 2008, Softw. Pract. Exp..

[7]  JAMES DEMMEL,et al.  LAPACK: A portable linear algebra library for high-performance computers , 1990, Proceedings SUPERCOMPUTING '90.

[8]  Robert A. van de Geijn,et al.  SuperMatrix: a multithreaded runtime scheduling system for algorithms-by-blocks , 2008, PPoPP.

[9]  Javier Cuenca,et al.  Towards the design of an automatically tuned linear algebra library , 2002, Proceedings 10th Euromicro Workshop on Parallel, Distributed and Network-based Processing.

[10]  Yuefan Deng,et al.  New trends in high performance computing , 2001, Parallel Computing.

[11]  Jack Dongarra,et al.  Automatic optimisation of parallel linear algebra routines in systems with variable load , 2003, Eleventh Euromicro Conference on Parallel, Distributed and Network-Based Processing, 2003. Proceedings..

[12]  Javier Cuenca,et al.  Modeling the behaviour of linear algebra algorithms with message-passing , 2001, Proceedings Ninth Euromicro Workshop on Parallel and Distributed Processing.