论文信息 - Accelerating the SPICE Circuit Simulator Using an FPGA: A Case Study

Accelerating the SPICE Circuit Simulator Using an FPGA: A Case Study

Spatial processing of sparse, irregular, double-precision floating-point computation using a single FPGA enables up to an order of magnitude speedup and energy-savings over a conventional microprocessor for the simulation program with integrated circuit emphasis (SPICE) circuit simulator. We develop a parallel, FPGA-based, heterogeneous architecture customized for accelerating the SPICE simulator to deliver this speedup. To properly parallelize the complete simulator, we decompose SPICE into its three constituent phases—Model Evaluation, Sparse Matrix-Solve, and Iteration Control—and customize a spatial architecture for each phase independently. Our heterogeneous FPGA organization mixes very large instruction word (VLIW), Dataflow and Streaming architectures into a cohesive, unified design. We program this parallel architecture with a high-level, domain-specific framework that identifies, exposes and exploits parallelism available in the SPICE circuit simulator using streaming (SCORE framework), data-parallel (Verilog-AMS models) and dataflow (KLU matrix solver) patterns. Our FPGA architecture is able to outperform conventional processors due to a combination of factors including high utilization of statically-scheduled resources, low-overhead dataflow scheduling of fine-grained tasks, and streaming, overlapped processing of the control algorithms. We expect approaches based on exploiting spatial parallelism to become important as frequency scaling continues to slow down and modern processing architectures turn to parallelism (e.g. multi-core, GPUs) due to constraints of power consumption.

André DeHon | Nachiket Kapre | A. DeHon | Nachiket Kapre

[1] Nachiket Kapre,et al. GraphStep: A System Architecture for Sparse-Graph Algorithms , 2006, 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[2] Albert E. Ruehli,et al. The modified nodal approach to network analysis , 1975 .

[3] Joseph A. Fisher. The VLIW Machine: A Multiprocessor for Compiling Scientific Code , 1984, Computer.

[4] John Wawrzynek,et al. Design automation for streaming systems , 2005 .

[5] Andrew B. Kahng,et al. Improved algorithms for hypergraph bipartitioning , 2000, ASP-DAC '00.

[6] Yasser Y. Hanafy,et al. Massive parallelization of SPICE device model evaluation on GPU-based SIMD architectures , 2008, IFMT '08.

[7] Teresa H. Y. Meng,et al. Towards program optimization through automated analysis of numerical precision , 2010, CGO '10.

[8] David Bryan,et al. Combinational profiles of sequential benchmark circuits , 1989, IEEE International Symposium on Circuits and Systems,.

[9] J. Gilbert,et al. Sparse Partial Pivoting in Time Proportional to Arithmetic Operations , 1986 .

[10] Eric R. Keiter,et al. The Xyce Parallel Electronic Simulator - An Overview , 2000 .

[11] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .

[12] Chung-Kuan Cheng,et al. Parallel transistor level circuit simulation using domain decomposition methods , 2009, 2009 Asia and South Pacific Design Automation Conference.

[13] David M. Lewis,et al. A compiled-code hardware accelerator for circuit simulation , 1992, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[14] David A. Patterson,et al. Computer Architecture - A Quantitative Approach, 5th Edition , 1996 .

[15] George Ho,et al. PAPI: A Portable Interface to Hardware Performance Counters , 1999 .

[16] Nachiket Kapre,et al. FX-SCORE: A Framework for Fixed-Point Compilation of SPICE Device Models Using Gappa++ , 2012, 2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines.

[17] Ralph Wittig,et al. Performance and power of cache-based reconfigurable computing , 2009, FPGA '09.

[18] Nachiket Kapre,et al. Packet Switched vs. Time Multiplexed FPGA Overlay Networks , 2006, 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[19] A. DeHon,et al. Parallelizing sparse Matrix Solve for SPICE circuit simulation using FPGAs , 2009, 2009 International Conference on Field-Programmable Technology.

[20] David E. Culler,et al. Monsoon: an explicit token-store architecture , 1998, ISCA '98.

[21] L. Lemaitre,et al. Extensions to Verilog-A to support compact device modeling , 2003, Proceedings of the 2003 IEEE International Workshop on Behavioral Modeling and Simulation.

[22] Guy Lemieux,et al. Towards reliable 5Gbps wave-pipelined and 3Gbps surfing interconnect in 65nm FPGAs , 2009, FPGA '09.

[23] Sudhakar Yalamanchili,et al. Interconnection Networks: An Engineering Approach , 2002 .

[24] Ekanathan Palamadai Natarajan,et al. KLU{A HIGH PERFORMANCE SPARSE LINEAR SOLVER FOR CIRCUIT SIMULATION PROBLEMS , 2005 .

[25] Stylianos Perissakis,et al. Stream computations organized for reconfigurable execution , 2006, Microprocess. Microsystems.

[26] Nachiket Kapre,et al. VLIW-SCORE: Beyond C for sequential control of SPICE FPGA acceleration , 2011, 2011 International Conference on Field-Programmable Technology.

[27] Sunil P. Khatri,et al. Fast circuit simulation on graphics processing units , 2009, 2009 Asia and South Pacific Design Automation Conference.

[28] Nachiket Kapre,et al. Accelerating SPICE Model-Evaluation using FPGAs , 2009, 2009 17th IEEE Symposium on Field Programmable Custom Computing Machines.

[29] Qiang Wang,et al. Automated field-programmable compute accelerator design using partial evaluation , 1997, Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186).

[30] Nachiket Kapre,et al. Optimistic Parallelization of Floating-Point Accumulation , 2007, 18th IEEE Symposium on Computer Arithmetic (ARITH '07).

[31] Goichi Yokomizo,et al. A parallel and accelerated circuit simulator with precise accuracy , 2002, Proceedings of ASP-DAC/VLSI Design 2002. 7th Asia and South Pacific Design Automation Conference and 15h International Conference on VLSI Design.

[32] Sani R. Nassif,et al. MAPS: multi-algorithm parallel circuit simulation , 2008, ICCAD 2008.

[33] Nachiket Kapre,et al. Performance comparison of single-precision SPICE Model-Evaluation on FPGA, GPU, Cell, and multi-core processors , 2009, 2009 International Conference on Field Programmable Logic and Applications.

[34] L. Dagum,et al. OpenMP: an industry standard API for shared-memory programming , 1998 .

[35] Gi-Joon Nam,et al. Ispd2009 clock network synthesis contest , 2009, ISPD '09.

[36] David M. Lewis. A programmable hardware accelerator for compiled electrical simulation , 1988, 25th ACM/IEEE, Design Automation Conference.Proceedings 1988..

[37] Nikil Mehta,et al. Time-Multiplexed FPGA Overlay Networks on Chip , 2006 .

[38] Florent de Dinechin,et al. When FPGAs are better at floating-point than microprocessors , 2008, FPGA '08.