Lu decomposition on a multiprocessing system with communications delay

A large amount of computer time is used for the solution of systems of linear equations in the course of the circuit simulation during the design of integrated circuits. This expenditure limits the size of circuits which can be practically simulated, and results in poor response time in an interactive environment. In order to increase the size of circuits which can be simulated, and increase the response time, one option pursued here is to apply concurrent computation to the linear equation solution aspect of circuit simulation. This concurrent computation will exploit inherent parallelism in the linear equation solution to reduce the time required for that solution. We focus on one particular method for solution of the linear equations: LU decomposition. While LU decomposition has a great deal of inherent parallelism, the wide range of sparse matrix structures requires that this parallelism be detected automatically. It has been determined that the overall speedup is sensitive to the delays between cooperating computational elements, and the manner in which the concurrent computations are mapped onto computational elements is therefore of importance. The approach used is as follows: Given a sparse matrix with a particular structure, a code generator produces a program representing the LU decomposition for that matrix. Another program detects the precedence constraints among the sequential instructions in the code and models the solution process as a directed graph. Based on this graph, scheduling techniques are employed to assign segments of code to computational elements for concurrent execution. Most of this thesis concentrates on the last problem, finding scheduling algorithms which reduce the sensitivity of the solution time to the communication delay among computational elements. This is based on the following observation. With zero delay, the common Hu's level scheduling algorithm gives good speedup performance. However when the communication delay is large compared to the execution time of an instruction in the code, considerable degradation on the speedup performance is observed for Hu's algorithm. Polynomial-time optimal scheduling algorithms appear to be intractable. Hence heuristic algorithms with feasible running time that give suboptimal schedules have to be constructed. This is approached in two different ways. Heuristic local minimization scheduling algorithms using two matching algorithms from combinatorial optimization are studied and promising results are obtained. These two matching algorithms, min-max matching and weighted matching, give optimal code-to-processor assignment at each time step. . . . (Author's abstract exceeds stipulated maximum length. Discontinued here with permission of author.) UMI