CASTA: CUDA-Accelerated Static Timing Analysis for VLSI Designs

General-purpose computing on graphics processing unit (GPGPU) enables the possibility of parallel computing for Static Timing Analysis (STA) of VLSI designs. However, memory access and synchronization between massively many cores become challenges to parallelizing STA. In this work, we developed a fast CUDA-Accelerated STA engine (named CASTA) that incorporates four novel techniques including Table-Index Remapping (TIR), Texture-Accelerated Rendering (TAR), Cell Levelization & Type Sorting (CLTS) and Timing-Table Restructuring(TTR) to enable high parallelism. Cell Levelization & Type Sorting (CLTS) levelizes cells and sort their types in order to efficiently access the same timing library. Timing-Table Restructuring (TTR) modifies the data structure for timing signals of cells to increase memory throughput. Table-Index Remapping (TIR) re-maps the axes of timing tables to retrieve data more efficiently while Texture-Accelerated Rendering (TAR) expands look-up tables (LUTs) to avoid extrapolation and stores LUTs in the texture for speed. As a result, our experimental result indicates that CASTA successfully enables high parallelism and outperforms a commercial tool by a three-order speedup on average over several benchmark circuits.

[1]  Anoop Gupta,et al.  Performance evaluation of memory consistency models for shared-memory multiprocessors , 1991, ASPLOS IV.

[2]  Valeria Bertacco,et al.  GCS: High-performance gate-level simulation with GPGPUs , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[3]  Alper Sen,et al.  Parallel Cycle Based Logic Simulation Using Graphics Processing Units , 2010, 2010 Ninth International Symposium on Parallel and Distributed Computing.

[4]  Sunil P. Khatri,et al.  Towards acceleration of fault simulation using Graphics Processing Units , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[5]  Sunil P. Khatri,et al.  Accelerating statistical static timing analysis using graphics processing units , 2009, 2009 Asia and South Pacific Design Automation Conference.

[6]  Steven M. Burns,et al.  The ISPD-2012 discrete cell sizing contest and benchmark suite , 2012, ISPD '12.

[7]  Michael S. Hsiao,et al.  FSimGP^2: An Efficient Fault Simulator with GPGPU , 2010, 2010 19th IEEE Asian Test Symposium.

[8]  Robert B. Hitchcock,et al.  Timing Verification and the Timing Analysis Program , 1982, 19th Design Automation Conference.

[9]  John D. Owens,et al.  GPU Computing , 2008, Proceedings of the IEEE.

[10]  James E. Kelley,et al.  Critical-path planning and scheduling , 1899, IRE-AIEE-ACM '59 (Eastern).

[11]  Yangdong Deng,et al.  The potential of GPUs for VLSI physical design automation , 2008, 2008 9th International Conference on Solid-State and Integrated-Circuit Technology.

[12]  Veljko M. Milutinovic,et al.  Distributed shared memory: concepts and systems , 1997, IEEE Parallel Distributed Technol. Syst. Appl..

[13]  Valeria Bertacco,et al.  Event-driven gate-level simulation with GP-GPUs , 2009, 2009 46th ACM/IEEE Design Automation Conference.