A parallel triangular solver for distributed-memory multiprocessor

We consider solving triangular systems of linear equations on a distributed-memory multiprocessor which allows for a ring embedding. Specifically, we propose a parallel algorithm, applicable when the triangular matrix is distributed by column in a wrap fashion. Numerical experiments indicate that the new algorithm is very efficient in some circumstances (in particular, when the size of the problem is sufficiently large relative to the number of processors).A theoretical analysis confirms that the total running time varies linearly, with respect to the matrix order, up to a threshold value of the matrix order, after which the dependence is quadratic. Moreover, we show that total message traffic is essentially the minimum possible.Finally, we describe an analogous row-oriented algorithm.