Parallelization and performance evaluation of circuit simulation on a shared-memory multiprocessor

Circuit simulation is a widely used but computationally demanding tool for VLSI design. In this paper, the considerations in achieving performance improvement through parallelization on a shared-memory multiprocessor are addressed. The two main components that comprise the computational bulk of circuit simulation, namely, matrix assembly and sparse matrix solution, raise very different issues in their parallelization. Parallelizing matrix assembly involves using a sequence of lock-synchronized parallel loops. A theoretical prediction of the performance of such loops is developed and this prediction is then compared to actual performance on a variety of circuits. Two approaches to parallel sparse matrix solution are contrasted: 1) an efficient implementation of an earlier proposed fine-grained model that captures parallelism at the elemental-operation level, and 2) a newly proposed medium-grained scheme that represents the computation at the row-operation level. A performance-evaluation framework is developed to interpret measured speedup in terms of various relevant factors. While the fine-grained approach achieves somewhat better load-balancing and also slightly lower scheduling overheads due to judicious task-clustering, the medium-grained approach is shown to be consistently superior for large circuit matrices due to lower operand access costs and better vectorization potential. The techniques developed have been incorporated into a prototype parallel implementation of the production circuit simulator ADVICE on the Alliant FX/8 multiprocessor.

[1]  A. Jimenez,et al.  Algorithms for ASTAP--A network-analysis program , 1973 .

[2]  Timothy A. Davis,et al.  PSOLVE : A Concurrent Algorithm for Solving Sparse Systems of Linear Equations , 1987, ICPP.

[3]  Robert F. Lucas,et al.  A Parallel Solution Method for Large Sparse Systems of Equations , 1987, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[4]  David L. Kuck,et al.  The Structure of Computers and Computations , 1978 .

[5]  Jochen A. G. Jess,et al.  A Data Structure for Parallel L/U Decomposition , 1982, IEEE Transactions on Computers.

[6]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[7]  H. Markowitz The Elimination form of the Inverse and its Application to Linear Programming , 1957 .

[8]  Frans J. Peters,et al.  Parallel pivoting algorithms for sparse symmetric matrices , 1984, Parallel Comput..

[9]  F. Gustavson,et al.  Implementing Linear Algebra Algorithms for Dense Matrices on a Vector Pipeline Machine , 1984 .

[10]  O. Wing,et al.  Optimal parallel triangulation of a sparse matrix , 1979 .

[11]  R. Betancourt Efficient parallel processing technique for inverting matrices with random sparsity , 1986 .

[12]  Stanley C. Eisenstat,et al.  Yale sparse matrix package I: The symmetric codes , 1982 .

[13]  Joseph W. H. Liu,et al.  Computational models and task scheduling for parallel sparse Cholesky factorization , 1986, Parallel Comput..

[14]  J. White,et al.  Reducing the parallel solution time of sparse circuit matrices using reordered Gaussian elimination and relaxation , 1988, 1988., IEEE International Symposium on Circuits and Systems.

[15]  Iain S. Duff,et al.  Parallel implementation of multifrontal schemes , 1986, Parallel Comput..

[16]  Mandayam A. Srinivas Optimal Parallel Scheduling of Gaussian Elimination DAG's , 1983, IEEE Transactions on Computers.

[17]  Iain S. Duff,et al.  MA28 --- A set of Fortran subroutines for sparse unsymmetric linear equations , 1980 .

[18]  G. Alaghband Multiprocessor sparse LU decomposition with controlled fill-in , 1986 .

[19]  Peter R. Benyon Exploiting Vector Computers by Replication , 1985, Comput. J..

[20]  Kishore Singhal,et al.  Computer Methods for Circuit Analysis and Design , 1983 .

[21]  S. P. Kumar,et al.  Solving Linear Algebraic Equations on an MIMD Computer , 1983, JACM.