Performance analysis in parallel triangular solver

Performance analysis plays a very important part in the design and implementation of parallel algorithms. The major reason is that the highly complex parallel computer architectures and very difficult task partitioning of most applications could lead to hardly extract the maximal performance. In this paper, we focus on the parallel implementation of a sparse well structural lower triangular system and its performance analysis. Two task partitioning methods are discussed with both task assignation and task schedule. Their parallel estimated times are provided by using a performance model and a performance evaluation methodology of parallel algorithms. The optimal task granularities are theoretically deduced by performance analysis. Experiences on transputer-based multicomputer are given.