Performance Experiments and Optimizations of PDE Sparse Solvers on Hypercubes

In this report we present the results of experiments with the parallel sparse matrix solver of the Parallel Ellpack System. 1bree different hypercube parallel machines are used to compare and optimize its performance. After a brief description of the parnIlel sparse matrix solver and a presentation of the machine parameters and features. the measurements of performance of the sparse solver on three machines are compared. We observe that the performance of an algorithm is architecture dependent. This program achieves nearly perfect speed up on the NCUBE/2 and Intel iPSC/2 and it runs with disappointing speed up on the Intel iPSC/860 for small problems. Two bottle-necks of this inefficiency are located and a block-wrnpping assignment and message merging method is devised to raise the computation granularity and reduce the corrununication overhead. Our experiments show that the method is very effective to improve the perfonnance on the iPSC/860 when the problem size is small. The NCUBE/2 turns out to be the most balanced design for the problem. Despite the lower efficiency on the iPSC/860, the machine is seen to be far more powerful in terms of absolute speed.