On the parallelization of blocked LU factorization algorithms on distributed memory architectures

The authors present the parallelization of blocked algorithms for LU factorization. They isolate problems inherent in sequential blocked algorithms and provide approaches to overcome them on distributed memory architectures. The performances of the parallelized versions of three blocked algorithms suited to column oriented Fortran are compared. Experiments are performed on the iPSC/860 hypercube. It is shown that it is not intuitively clear which algorithm might perform best on a given architecture; this is dependent on the problem size and the number of available parameters.<<ETX>>