Optimal parallelization of a recursive algorithm for triangular matrix inversion on MIMD computers

Abstract This paper studies the parallelization of a recursive algorithm for triangular matrix inversion (TMI), using the “divide and conquer” paradigm. For a (large scale) matrix of size n=m2 k (m,k⩾1) and p=2 q (⩽n/2) available processors, we first construct an adequate 2-phases task segmentation and inducing a balanced layered task graph. Then, we design a greedy scheduling leading to a cost optimal parallel algorithm, i.e. whose efficiency is equal to 1 for large n. The practical interest of the contribution is proven through an experimental study of two versions of the original algorithm on an IBM SP1 distributed memory multiprocessor.