Tatum: Parallel Timing Analysis for Faster Design Cycles and Improved Optimization

Static Timing Analysis (STA) is used to evaluate the correctness and performance of a digital circuit implementation. In addition to final sign-off checks, STA is called numerous times during placement and routing to guide optimization. As a result, STA consumes a significant fraction of the time required for design implementation; to make progress reducing FPGA compile times we need faster STA. We evaluate the suitability of both GPU and multi-core CPU platforms for accelerating STA. On core STA algorithms our GPU kernel achieves a 6.2 times kernel speed-up but data transfer overhead reduces this to 0.9 times. Our best CPU implementation achieves a 9.2 times parallel speed-up on 32 cores, yielding a 15.2 times overall speed-up compared to the VPR analyzer, and a 6.9 times larger parallel speed-up than a recent parallel ASIC timing analyzer. We then show how reducing the run-time cost of STA can be leveraged to improve optimization quality, reducing critical path delay by 4%.

[1]  Guojie Luo,et al.  Corolla: GPU-Accelerated FPGA Routing Based on Subgraph Dynamic Expansion , 2017, FPGA.

[2]  Vaughn Betz,et al.  VPR: A new packing, placement and routing tool for FPGA research , 1997, FPL.

[3]  David Blaauw,et al.  Efficient Monte Carlo based incremental statistical timing analysis , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[4]  Carl Ebeling,et al.  Stratix™ 10 High Performance Routable Clock Networks , 2016, FPGA.

[5]  Andrew B. Kahng,et al.  New game, new goal posts: A recent history of timing closure , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[6]  Mirjana Stojilović,et al.  Parallel FPGA routing: Survey and challenges , 2017, 2017 27th International Conference on Field Programmable Logic and Applications (FPL).

[7]  Sen Wang,et al.  VTR 7.0: Next Generation Architecture and CAD System for FPGAs , 2014, TRETS.

[8]  Mike Hutton,et al.  Efficient static timing analysis and applications using edge masks , 2005, FPGA '05.

[9]  Sunil P. Khatri,et al.  Accelerating statistical static timing analysis using graphics processing units , 2009, 2009 Asia and South Pacific Design Automation Conference.

[10]  Vaughn Betz,et al.  Timing-driven placement for FPGAs , 2000, FPGA '00.

[11]  Yajun Ha,et al.  ParaLaR: A parallel FPGA router based on Lagrangian relaxation , 2015, 2015 25th International Conference on Field Programmable Logic and Applications (FPL).

[12]  Arch D. Robison,et al.  Structured Parallel Programming: Patterns for Efficient Computation , 2012 .

[13]  Jin-Hee Cho,et al.  Trust-Based Multi-objective Optimization for Node-to-Task Assignment in Coalition Networks , 2013, FCCM 2013.

[14]  David M. Lewis,et al.  Architectural enhancements in Stratix V™ , 2013, FPGA '13.

[15]  Gary William Grewal,et al.  A scalable, serially-equivalent, high-quality parallel placement methodology suitable for modern multicore and GPU architectures , 2014, 2014 24th International Conference on Field Programmable Logic and Applications (FPL).

[16]  Charles H.-P. Wen,et al.  CASTA: CUDA-Accelerated Static Timing Analysis for VLSI Designs , 2014, 2014 43rd International Conference on Parallel Processing.

[17]  Marcel Gort,et al.  Deterministic multi-core parallel routing for FPGAs , 2010, 2010 International Conference on Field-Programmable Technology.

[18]  Rakesh Chadha,et al.  Static Timing Analysis for Nanometer Designs: A Practical Approach , 2009 .

[19]  Yangdong Deng,et al.  Taming irregular EDA applications on GPUs , 2009, 2009 IEEE/ACM International Conference on Computer-Aided Design - Digest of Technical Papers.

[20]  Vaughn Betz,et al.  Timing-Driven Titan: Enabling Large Benchmarks and Exploring the Gap between Academic and Commercial CAD , 2015, TRETS.

[21]  Christopher D. Carothers,et al.  Prototype for a large-scale static timing analyzer running on an IBM Blue Gene , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[22]  Steven Trimberger,et al.  Three Ages of FPGAs: A Retrospective on the First Thirty Years of FPGA Technology , 2015, Proceedings of the IEEE.

[23]  Jason H. Anderson,et al.  Compact Area and Performance Modelling for CGRA Architecture Evaluation , 2018, 2018 International Conference on Field-Programmable Technology (FPT).

[24]  Valavan Manohararajah,et al.  The Stratix™ 10 Highly Pipelined FPGA Architecture , 2016, FPGA.

[25]  Vaughn Betz,et al.  Slack Allocation and Routing to Improve FPGA Timing While Repairing Short-Path Violations , 2008, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[26]  Vaughn Betz,et al.  Speeding Up FPGA Placement: Parallel Algorithms and Methods , 2014, 2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines.

[27]  Guy Lemieux,et al.  Deterministic Timing-Driven Parallel Placement by Simulated Annealing Using Half-Box Window Decomposition , 2011, 2011 International Conference on Reconfigurable Computing and FPGAs.

[28]  Jiang Hu,et al.  GPU acceleration for PCA-based statistical static timing analysis , 2015, 2015 33rd IEEE International Conference on Computer Design (ICCD).

[29]  David Blaauw,et al.  Statistical Timing Analysis: From Basic Principles to State of the Art , 2008, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[30]  Scott Hauck,et al.  Enhancing timing-driven FPGA placement for pipelined netlists , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[31]  Vaughn Betz,et al.  Efficient and Deterministic Parallel Placement for FPGAs , 2011, TODE.

[32]  Martin D. F. Wong,et al.  OpenTimer: A high-performance timing analysis tool , 2015, 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).