Realistic Performance Prediction Tool for the Parallel Block LU Factorization Algorithm

This work describes a realistic performance prediction tool for the parallel block LU factorization algorithm. It takes into account the computational workload, communication costs and the overlapping of communications by useful computations. Estimation of the tool parameters and benchmarking are also discussed. Using this tool we develop a simple heuristic for scheduling LU factorization tasks. Results of numerical experiments are presented.

[1]  Arif Ghafoor,et al.  PAWS: a performance evaluation tool for parallel computing systems , 1991, Computer.

[2]  FahringerThomas Estimating and Optimizing Performance for Parallel Programs , 1995 .

[3]  Raimondas Čiegis,et al.  On the Efficiency of Scheduling Algorithms for Parallel Gaussian Elimination with Communication Delays , 2000, PARA.

[4]  Jaeyoung Choi,et al.  The design of a parallel dense linear algebra software library: Reduction to Hessenberg, tridiagonal, and bidiagonal form , 1995, Numerical Algorithms.

[5]  Per Brinch Hansen Studies in Computational Science: Parallel Programming Paradigms , 1995 .

[6]  Albert Y. Zomaya,et al.  Genetic Scheduling for Parallel Processor Systems: Comparative Studies and Performance Issues , 1999, IEEE Trans. Parallel Distributed Syst..

[7]  Anthony J. G. Hey,et al.  Realistic Parallel Performance Estimation , 1997, Parallel Comput..

[8]  Gene H. Golub,et al.  Matrix computations , 1983 .

[9]  James Demmel,et al.  ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers - Design Issues and Performance , 1995, PARA.

[10]  Evripidis Bampis,et al.  Scheduling Algorithms for Parallel Gaussian Elimination With Communication Costs , 1998, IEEE Trans. Parallel Distributed Syst..

[11]  Zhiwei Xu,et al.  Early Prediction of MPP Performance: Th SP2, T3D, and Paragon Experiences , 1996, Parallel Comput..

[12]  Jerry C. Yan,et al.  Performance Evaluation Tools for Parallel and Distributed Systems - Guest Editors' Introduction , 1995, Computer.

[13]  Roger W. Hockney,et al.  Performance parameters and benchmarking of supercomputers , 1991, Parallel Comput..

[14]  Raimondas Ciegis,et al.  One Application of the Parallelization Tool of Master-Slave Algorithms , 2002, Informatica.

[15]  Goran Lj. Djordjevic,et al.  A Heuristic for Scheduling Task Graphs with Communication Delays Onto Multiprocessors , 1996, Parallel Comput..

[16]  Yves Robert,et al.  Optimal Scheduling Algorithms for Parallel Gaussian Elimination , 1989, Theor. Comput. Sci..

[17]  Zhiwei Xu,et al.  Modeling communication overhead: MPI and MPL performance on the IBM SP2 , 1996, IEEE Parallel Distributed Technol. Syst. Appl..

[18]  M. Baravykaite,et al.  THE TEMPLATE PROGRAMMING OF PARALLEL ALGORITHMS , 2002 .