Performance Prediction and Analysis of Parallel Out-Of-Core Matrix Factorization

In this paper, we present an analytical performance model of the parallel left-right looking out-of-core LU factorization algorithm. We show the accuracy of the performance prediction for a prototype implementation in the ScaLAPACK library. We will show that with a correct distribution of the matrix and with an overlapof IO by computation, we obtain performances similar to those of the in-core algorithm. To get such performances, the size of the physical main memory only need to be proportional to the product of the matrix order (not the matrix size) by the ratio of the IO bandwidth and the computation rate: There is no need of large main memory for the factorization of huge matrix!