NICSLU: An Adaptive Sparse Matrix Solver for Parallel Circuit Simulation

The sparse matrix solver has become a bottleneck in simulation program with integrated circuit emphasis (SPICE)-like circuit simulators. It is difficult to parallelize the solver because of the high data dependency during the numeric LU factorization and the irregular structure of circuit matrices. This paper proposes an adaptive sparse matrix solver called NICSLU, which uses a multithreaded parallel LU factorization algorithm on shared-memory computers with multicore/multisocket central processing units to accelerate circuit simulation. The solver can be used in all the SPICE-like circuit simulators. A simple method is proposed to predict whether a matrix is suitable for parallel factorization, such that each matrix can achieve optimal performance. The experimental results on 35 matrices reveal that NICSLU achieves speedups of 2.08× ~ 8.57×(on the geometric mean), compared with KLU, with 1-12 threads, for the matrices which are suitable for the parallel algorithm. NICSLU can be downloaded from http://nicslu.weebly.com.

[1]  Joseph W. H. Liu,et al.  The Multifrontal Method for Sparse Matrix Solution: Theory and Practice , 1992, SIAM Rev..

[2]  J. Gilbert,et al.  Sparse Partial Pivoting in Time Proportional to Arithmetic Operations , 1986 .

[3]  Mark Zwolinski,et al.  Parallel sparse matrix solver for direct circuit simulations on FPGAs , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.

[4]  M. Hestenes,et al.  Methods of conjugate gradients for solving linear systems , 1952 .

[5]  Timothy A. Davis,et al.  Algorithm 832: UMFPACK V4.3---an unsymmetric-pattern multifrontal method , 2004, TOMS.

[6]  Timothy A. Davis,et al.  Algorithm 907 , 2010 .

[7]  A. Sangiovanni-Vincentelli,et al.  A multilevel Newton algorithm with macromodeling and latency for the analysis of large-scale nonlinear circuits in the time domain , 1979 .

[8]  Timothy A. Davis,et al.  The university of Florida sparse matrix collection , 2011, TOMS.

[9]  Yu Wang,et al.  An adaptive LU factorization algorithm for parallel circuit simulation , 2012, 17th Asia and South Pacific Design Automation Conference.

[10]  J. Gilbert Predicting Structure in Sparse Matrix Computations , 1994 .

[11]  Zhiyu Zeng,et al.  Fast static analysis of power grids: Algorithms and implementations , 2011, 2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[12]  Rick E. Poore GPU-accelerated time-domain circuit simulation , 2009, 2009 IEEE Custom Integrated Circuits Conference.

[13]  Wayne B. Hayes,et al.  Algorithm 908 , 2010 .

[14]  Zhiyu Zeng,et al.  Parallel On-Chip Power Distribution Network Analysis on Multi-Core-Multi-GPU Platforms , 2011, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[15]  Xiaoye Sherry Li,et al.  Sparse Gaussian Elimination on High Performance Computers , 1996 .

[16]  Ieee Circuits,et al.  IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems information for authors , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[17]  Iain S. Duff,et al.  On Algorithms For Permuting Large Entries to the Diagonal of a Sparse Matrix , 2000, SIAM J. Matrix Anal. Appl..

[18]  Rajendran Panda,et al.  Hierarchical analysis of power distribution networks , 2000, DAC.

[19]  Wei Wu,et al.  An EScheduler-Based Data Dependence Analysis and Task Scheduling for Parallel Circuit Simulation , 2011, IEEE Transactions on Circuits and Systems II: Express Briefs.

[20]  Yiyu Shi,et al.  On the preconditioner of conjugate gradient method — A power grid simulation perspective , 2011, 2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[21]  Zhao Li,et al.  A quasi-Newton preconditioned Newton-Krylov method for robust and efficient time-domain simulation of integrated circuits with strong parasitic couplings , 2006, Asia and South Pacific Conference on Design Automation, 2006..

[22]  Sunil P. Khatri,et al.  Fast circuit simulation on graphics processing units , 2009, 2009 Asia and South Pacific Design Automation Conference.

[23]  John K. Reid,et al.  Algorithm 529: Permutations To Block Triangular Form [F1] , 1978, TOMS.

[24]  Sivasankaran Rajamanickam,et al.  ShyLU: A Hybrid-Hybrid Solver for Multicore Platforms , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[25]  James Demmel,et al.  SuperLU_DIST: A scalable distributed-memory sparse direct solver for unsymmetric linear systems , 2003, TOMS.

[26]  Prawat Nagvajara,et al.  Sparse LU Decomposition using FPGA ⋆ , 2008 .

[27]  Chung-Kuan Cheng,et al.  Two-Stage Newton-Raphson Method for Transistor-Level Simulation , 2007, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[28]  Danny C. Sorensen,et al.  Large power grid analysis using domain decomposition , 2006, Proceedings of the Design Automation & Test in Europe Conference.

[29]  Nachiket Kapre,et al.  SPICE²: A Spatial, Parallel Architecture for Accelerating the Spice Circuit Simulator , 2011 .

[30]  James Demmel,et al.  An Asynchronous Parallel Supernodal Algorithm for Sparse Gaussian Elimination , 1997, SIAM J. Matrix Anal. Appl..

[31]  Timothy A. Davis,et al.  Direct methods for sparse linear systems , 2006, Fundamentals of algorithms.

[32]  Patrick Amestoy,et al.  Hybrid scheduling for the parallel solution of linear systems , 2006, Parallel Comput..

[33]  James Demmel,et al.  A Supernodal Approach to Sparse Partial Pivoting , 1999, SIAM J. Matrix Anal. Appl..

[34]  Martin Fischer,et al.  Multigranular parallel algorithms for solving linear equations in VLSI circuit simulation , 2004, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[35]  Iain S. Duff,et al.  The Design and Use of Algorithms for Permuting Large Entries to the Diagonal of Sparse Matrices , 1999, SIAM J. Matrix Anal. Appl..

[36]  Wei Dong,et al.  WavePipe: Parallel transient simulation of analog and digital circuits on multi-core shared-memory machines , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[37]  K. W. Chan Parallel algorithms for direct solution of large sparse power system matrix equations , 2001 .

[38]  A. DeHon,et al.  Parallelizing sparse Matrix Solve for SPICE circuit simulation using FPGAs , 2009, 2009 International Conference on Field-Programmable Technology.

[39]  Timothy A. Davis,et al.  Algorithm 837: AMD, an approximate minimum degree ordering algorithm , 2004, TOMS.

[40]  Alberto L. Sangiovanni-Vincentelli,et al.  The Waveform Relaxation Method for Time-Domain Analysis of Large Scale Integrated Circuits , 1982, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[41]  Sani R. Nassif,et al.  Hierarchical Multialgorithm Parallel Circuit Simulation , 2011, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[42]  Jorge J. Moré,et al.  Digital Object Identifier (DOI) 10.1007/s101070100263 , 2001 .

[43]  Olaf Schenk,et al.  Solving unsymmetric sparse systems of linear equations with PARDISO , 2002, Future Gener. Comput. Syst..

[44]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[45]  Eric R. Keiter,et al.  A parallel preconditioning strategy for efficient transistor-level circuit simulation , 2009, 2009 IEEE/ACM International Conference on Computer-Aided Design - Digest of Technical Papers.

[46]  Qinghua Zheng,et al.  A new approach for parallel simulation of VLSI circuits on a transistor level , 1998 .

[47]  Wei Wu,et al.  FPGA Accelerated Parallel Sparse Matrix Factorization for Circuit Simulations , 2011, ARC.

[48]  Sani R. Nassif,et al.  MAPS: Multi-Algorithm Parallel circuit Simulation , 2008, 2008 IEEE/ACM International Conference on Computer-Aided Design.

[49]  Chung-Kuan Cheng,et al.  Parallel transistor level circuit simulation using domain decomposition methods , 2009, 2009 Asia and South Pacific Design Automation Conference.